使用LINQ从列表<Object>中删除重复项[英] Remove duplicates from a List<Object> using LINQ

本文是小编为大家收集整理的关于使用LINQ从列表<Object>中删除重复项的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

这个问题很简单,简单的搜索就给出了许多类似的问题,但我仍在努力为此找到答案.

我想根据其拥有的所有属性从List<Column> columns中删除重复对象. Column类本身具有List<string>属性,这是我认为我在GetHashCode()或Equals部分中都有问题的地方.

这是我编写的完整代码,但我没有得到正确的结果.例如,在下面的代码中,我要删除列3,因为它在各个方面与列1相同.

using System;
using System.Collections.Generic;
using System.Linq;

namespace Arrays
{
    public class ColumnList
    {
        public static void RemoveDuplicateColumnTypes()
        {
            // define columns
            var column1 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C40" }
            };
            var column2 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C30" }
            };
            var column3 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C40"}
            };

            // list of columns
            var columns = new List<Column> { column1, column2, column3 };

            var result = columns.Distinct(new ColumnListComparer());
        }
    }

    public class ColumnListComparer : IEqualityComparer<Column>
    {
        public bool Equals(Column x, Column y)
        {
            if (x == null || y == null) return false;

            if (Math.Abs(x.StartElevation - y.StartElevation) < 0.001 &&
                Math.Abs(x.EndElevation - y.EndElevation) < 0.001 &&
                x.ListOfSections.SequenceEqual(y.ListOfSections))
            {
                return true;
            }
            return false;
        }

        public int GetHashCode(Column obj)
        {
            return obj.StartElevation.GetHashCode() ^
                   obj.EndElevation.GetHashCode() ^
                   obj.ListOfSections.GetHashCode();
        }
    }

    public class Column
    {
        public double StartElevation { get; set; }
        public double EndElevation { get; set; }
        public List<string> ListOfSections { get; set; }
    }

}

推荐答案

列表的哈希代码将不匹配.因此,整个哈希代码不匹配.

尝试以下操作:

 public int GetHashCode(Column obj)
 {
     return 42;
 }
如果对象相等,则

哈希代码必须相等.通常,返回固定数字并不理想,但有效.为什么42?搭便车徒步旅行者.

如果有效,那么您可以寻找一个更好的哈希函数,该功能实际使用列表中的某些值.

只要您遵循对象的黄金法则相等,您的哈希函数的复杂程度取决于您,它们的哈希功能必须相等.

这是一个有效的哈希函数,它只是考虑了列表的长度:

 public int GetHashCode(Column obj)
 {
    return obj.ListOfSections.Count;
 }

其他推荐答案

更改下面的GetHashCode.

public int GetHashCode(Column obj)
{
    unchecked
    {
        var hashCode = obj.StartElevation.GetHashCode();
        hashCode = (hashCode * 397) ^ obj.EndElevation.GetHashCode();
        foreach (var item in obj.ListOfSections)
        {
            hashCode = (hashCode * 397) ^ item.GetHashCode();
        }
        return hashCode;
    }
}

您的版本不起作用,因为ListOfSections.GetHashCode是不确定的,它可能会返回任何用于检查参考平等的东西,但我们在这里处理值平等.因此,您必须基于"值平等"

生成主题码

其他推荐答案

这是罪魁祸首:

return obj.StartElevation.GetHashCode() ^
       obj.EndElevation.GetHashCode() ^
       obj.ListOfSections.GetHashCode(); // <<== This line

列表不会将其哈希代码基于其元素的哈希代码.具有相同元素的两个不同列表不会具有相同的哈希代码.将其更改为汇总列表成员的哈希码以使其起作用的行:

return 31*31*obj.StartElevation.GetHashCode() +
       31*obj.EndElevation.GetHashCode() +
       obj.ListOfSections.Aggregate((p, v) => 31*p + v.GetHashCode());

注意:尽管这将使您的代码消除cell3,您的代码将保持无效.原因并不那么明显 - 您将遇到的问题是您的Equals方法不是传递的,因为您通过允许它们的差异在公差0.001之内进行比较而被认为是平等的.结果是

Cell1 == Cell2 && Cell2 == Cell3

不再暗示

Cell1 == Cell3

这从根本上是错误的.

此外,这意味着您的算法认为相等的两个对象可能具有不同的哈希码.平等比较的合同禁止这一点.

为了解决此问题,请从表示高度为double的情况下转换,然后使用int或long的存储高度,其高度比您当前拥有的单元小1000倍.换句话说,如果您的代码将高程存储为双123.456,则新代码应存储一个整数123456.这将使您将平等与正确的宽容程度进行比较.当您获得外部用途的高程时,将数字投入到double并除以1000以产生旧结果.

本文地址:https://www.itbaoku.cn/post/1556895.html

问题描述

The question is straightforward and a simple search in SO gives a number of similar questions but I'm still struggling to get an answer for this.

I want to remove the duplicate objects from List<Column> columns based on all the properties they have. The Column class itself has a List<string> property and this is where I think I have problems maybe in the GetHashCode() or Equals part.

Here is the full code I've written but I'm not getting the correct results. For example in the code below, I want to remove column3 because it is the same as column1 in every aspect.

using System;
using System.Collections.Generic;
using System.Linq;

namespace Arrays
{
    public class ColumnList
    {
        public static void RemoveDuplicateColumnTypes()
        {
            // define columns
            var column1 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C40" }
            };
            var column2 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C30" }
            };
            var column3 = new Column
            {
                StartElevation = 0,
                EndElevation = 310,
                ListOfSections = new List<string> { "C50", "C40"}
            };

            // list of columns
            var columns = new List<Column> { column1, column2, column3 };

            var result = columns.Distinct(new ColumnListComparer());
        }
    }

    public class ColumnListComparer : IEqualityComparer<Column>
    {
        public bool Equals(Column x, Column y)
        {
            if (x == null || y == null) return false;

            if (Math.Abs(x.StartElevation - y.StartElevation) < 0.001 &&
                Math.Abs(x.EndElevation - y.EndElevation) < 0.001 &&
                x.ListOfSections.SequenceEqual(y.ListOfSections))
            {
                return true;
            }
            return false;
        }

        public int GetHashCode(Column obj)
        {
            return obj.StartElevation.GetHashCode() ^
                   obj.EndElevation.GetHashCode() ^
                   obj.ListOfSections.GetHashCode();
        }
    }

    public class Column
    {
        public double StartElevation { get; set; }
        public double EndElevation { get; set; }
        public List<string> ListOfSections { get; set; }
    }

}

推荐答案

The hash code for the lists is not going to match. So the whole hash code won't match.

Try this:

 public int GetHashCode(Column obj)
 {
     return 42;
 }

Hash codes must be equal if the objects are equal. It is not ideal in general to return a fixed number, but it is entitily valid. Why 42? Hitch hikers fan.

If that works, then you can look for a better hash function that actually uses some of the values in the list.

How complex you make the hash function is up to you as long as you follow the golden rule of objects are equal, their hash functions must be equal.

Here is for example a valid hash function that just takes into account the length of the list:

 public int GetHashCode(Column obj)
 {
    return obj.ListOfSections.Count;
 }

其他推荐答案

Change your GetHashCode as below.

public int GetHashCode(Column obj)
{
    unchecked
    {
        var hashCode = obj.StartElevation.GetHashCode();
        hashCode = (hashCode * 397) ^ obj.EndElevation.GetHashCode();
        foreach (var item in obj.ListOfSections)
        {
            hashCode = (hashCode * 397) ^ item.GetHashCode();
        }
        return hashCode;
    }
}

Your version doesn't work because ListOfSections.GetHashCode is indeterminant, which may return anything as it is used to check for reference equality but we're dealing with value equality here. So you must generate hashcode based on "Value equality"

其他推荐答案

Here is the culprit:

return obj.StartElevation.GetHashCode() ^
       obj.EndElevation.GetHashCode() ^
       obj.ListOfSections.GetHashCode(); // <<== This line

Lists do not base their hash code on the hash codes of their elements. Two different lists with identical element would not have the same hash code. Change this to a line that aggregates the hash codes of list members to make it work:

return 31*31*obj.StartElevation.GetHashCode() +
       31*obj.EndElevation.GetHashCode() +
       obj.ListOfSections.Aggregate((p, v) => 31*p + v.GetHashCode());

Note: Although this will make your code eliminate cell3, your code would remain invalid. The reason is not so obvious - the problem you will have is that your Equals method is not transitive, due to the fact that you compare doubles by allowing their differences be within a tolerance 0.001 to be considered equal. A consequence of this is that

Cell1 == Cell2 && Cell2 == Cell3

no longer imply that

Cell1 == Cell3

This is fundamentally wrong.

Moreover, it means that two objects that your algorithm considers equal may have different hash codes. The contract of the equality comparer prohibits this.

In order to fix this problem, switch away from representing elevations as double, and use int or long to store elevations expressed in units that are 1000 times smaller than the units that you currently have. In other words, if your code stores an elevation as a double 123.456, the new code should store an integer 123456. This would let you compare for equality with the right degree of tolerance. When you obtain the elevation for external use, cast the number to double and divide by 1000 to produce the old result.