gpt4 book ai didi

c# - 使用反射和Linq Except/Intersect比较对象集合的性能较差

转载 作者:行者123 更新时间:2023-12-03 19:29:04 25 4
gpt4 key购买 nike

我编写了一个应用程序,该应用程序比较两个对象(相同类型)的集合,并通过使用它们的属性值(或它们的属性组合)比较对象来得出相似点和不同点。此应用程序从未打算在两个集合中的任何一个上扩展到10000个以上的对象,因此被认为是一项长期运行的操作。现在,业务需求已发生变化,我们需要能够比较两个集合中最多50000个对象(拉伸目标最多100000个)。

以下是要比较的类型的最小示例。

    internal class Employee
{
public string ReferenceCode { get; set; }
}


为此,我为此类型编写了一个自定义的相等比较器,该比较器将属性名称用作构造函数参数。参数化此参数的原因是避免为每种类型的每个属性编写不同的相等比较器(这是一个相当大的数目,而且这听起来像是一种巧妙的解决方案)。

   public class EmployeeComparerDynamic : IEqualityComparer<Employee>
{
string PropertyNameToCompare { get; set; }
public EmployeeComparerDynamic(string propertyNameToCompare)
{
PropertyNameToCompare = propertyNameToCompare;
}

public bool Equals(Employee x, Employee y)
{
return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null
&& x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
.Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
}

public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetHashCode();
return hash;
}
}
}


使用这个相等比较器,我一直在使用LINQ IntersectExcept函数比较对象的集合。

        var intersectingEmployeesLinq = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();

var deltaEmployeesLinq = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();


在缩放限制要求增加之前,这一切都很好,我注意到我的应用在处理大量对象时的性能非常差。

最初,我以为这很正常,完成的总时间可能会大大增加,但是,当我尝试手动遍历一个列表并比较该项目以检查另一个项目中是否存在该项目时,列表-我注意到我自己的LINQ ExceptIntersect在我的应用程序上下文中实现的实现产生了相同的结果,但是性能要好得多。

        var intersectingEmployeesManual = new List<Employee>();           

foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
intersectingEmployeesManual.Add(employee);
}


与早期代码段中的实现相比,它的性能要好得多(约30倍)。当然,前面的代码段使用反射来获取属性的值,因此我也尝试了这一点。

        var intersectingEmployeesManual = new List<Employee>();

foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.GetType()
.GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
intersectingEmployeesManual.Add(employee);
}


效果仍然好大约2-3倍。最后,我然后编写了另一个相等比较器,但是没有对属性进行参数化,而是将其与类型的预定义属性进行比较。

    public class EmployeeComparerManual : IEqualityComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
return y.ReferenceCode != null
&& x.ReferenceCode.Equals(y.ReferenceCode);
}

public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.ReferenceCode.GetHashCode();
return hash;
}
}
}


以及相应的代码来计算交集和增量对象。

        var intersectingEmployeesLinqManual = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();

var deltaEmployeesLinqManual = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerManual()).ToList();


最终,我开始获得此实现所需的缩放比例,但除此之外,我还使用10种不同的机器进行了一些基准测试。结果如下所示(平均,以毫秒为单位四舍五入到最接近的毫秒)。

    +-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
| | List Items | Intersect | Intersect Dynamic | Except | Except Dynamic | Intersect Linq | Intersect Linq Dynamic | Except Linq | Except Linq Dynamic |
+-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+
| Run 1 | 5000/4000 | 479 | 7440 | 340 | 7439 | 1 | 14583 | 2 | 15257 |
| Run 2 | 10000/8000 | 2177 | 32489 | 1282 | 29290 | 1 | 59154 | 2 | 74170 |
| Run 3 | 20000/16000 | 6758 | 116266 | 4578 | 116720 | 5 | 225960 | 3 | 295146 |
| Run 4 | 50000/40000 | 34457 | 720023 | 30693 | 731690 | 14 | 1483084 | 14 | 1657832 |
+-------+-------------+-----------+-------------------+--------+----------------+----------------+------------------------+-------------+---------------------+


因此,到目前为止,我的总结是:


使用反射获取属性的值会增加开销,开销介于15到20之间
在相等比较器和LINQ ExceptIntersect中使用反射会增加2-3倍的开销


我的未解决问题是:


使用反射来获取财产的价值真的增加了太多的开销,还是我在这里错过了一个难题?
当将LINQ与不使用反射的相等比较器一起使用时,为什么我只能得到承诺的O(n + m)总体努力?
我是否有希望找到并找到可以在每种类型中使用相等比较器并以某种方式参数化要比较的属性而不是在每种属性中使用每种类型的相等比较器的方法?
附带问题-为什么在平等比较器中将反射与LINQ ExceptIntersect结合使用,与我自己的基本实现(仅遍历列表比较所有内容)相比,会增加额外的开销?


最后,下面是一个完整的可复制示例:

class Program
{
static void Main(string[] args)
{
StackOverflow();
}

private static void StackOverflow()
{
var firstEmployeeList = CreateEmployeeList(5000);
var secondEmployeeList = CreateEmployeeList(4000);

var intersectingEmployeesManual = new List<Employee>();
var sw = new Stopwatch();

//Intersecting employees - comparing predefined property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.ReferenceCode == employee.ReferenceCode))
intersectingEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Intersecting Employees Manual: " + sw.ElapsedMilliseconds);
intersectingEmployeesManual.Clear();
sw.Reset();

//Intersecting employees - comparing dynamic property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.Any(x => x.GetType()
.GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
intersectingEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Intersecting Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();

//Delta Employees - comparing predefined property
var deltaEmployeesManual = new List<Employee>();
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList.All(x => x.ReferenceCode != employee.ReferenceCode))
deltaEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Delta Employees Manual: " + sw.ElapsedMilliseconds);
sw.Reset();
deltaEmployeesManual.Clear();

//Delta Employees - comparing dynamic property
sw.Start();
foreach (var employee in firstEmployeeList)
{
if (secondEmployeeList
.All(x => !x.GetType().GetProperty("ReferenceCode").GetValue(x)
.Equals(employee.GetType().GetProperty("ReferenceCode").GetValue(employee))))
deltaEmployeesManual.Add(employee);
}
sw.Stop();
Console.WriteLine("Delta Employees Manual (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();

//Intersecting employees Linq - dynamic property
sw.Start();
var intersectingEmployeesLinq = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
sw.Stop();
Console.WriteLine("Intersecting Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();

//Intersecting employees Linq - manual property
sw.Start();
var intersectingEmployeesLinqManual = firstEmployeeList
.Intersect(secondEmployeeList, new EmployeeComparerManual()).ToList();
sw.Stop();
Console.WriteLine("Intersecting Employees Linq (manual property): " + sw.ElapsedMilliseconds);
sw.Reset();

//Delta employees Linq - dynamic property
sw.Start();
var deltaEmployeesLinq = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerDynamic("ReferenceCode")).ToList();
sw.Stop();
Console.WriteLine("Delta Employees Linq (dynamic property): " + sw.ElapsedMilliseconds);
sw.Reset();

//Delta employees Linq - manual property
sw.Start();
var deltaEmployeesLinqManual = firstEmployeeList
.Except(secondEmployeeList, new EmployeeComparerManual()).ToList();
sw.Stop();
Console.WriteLine("Delta Employees Linq (manual property): " + sw.ElapsedMilliseconds);
sw.Reset();

Console.WriteLine("Finished");
Console.ReadLine();

}

private static List<Employee> CreateEmployeeList(int numberToCreate)
{
var employeList = new List<Employee>();
for (var i = 0; i < numberToCreate; i++)
{
employeList.Add(new Employee
{
ReferenceCode = i.ToString()
});
}
return employeList;
}

internal class Employee
{
public string ReferenceCode { get; set; }
}

public class EmployeeComparerDynamic : IEqualityComparer<Employee>
{
string PropertyNameToCompare { get; set; }
public EmployeeComparerDynamic(string propertyNameToCompare)
{
PropertyNameToCompare = propertyNameToCompare;
}

public bool Equals(Employee x, Employee y)
{
return y.GetType().GetProperty(PropertyNameToCompare).GetValue(y) != null
&& x.GetType().GetProperty(PropertyNameToCompare).GetValue(x)
.Equals(y.GetType().GetProperty(PropertyNameToCompare).GetValue(y));
}

public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.GetType().GetProperty(PropertyNameToCompare).GetValue(x).GetHashCode();
return hash;
}
}
}

public class EmployeeComparerManual : IEqualityComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
return y.ReferenceCode != null
&& x.ReferenceCode.Equals(y.ReferenceCode);
}

public int GetHashCode(Employee x)
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.ReferenceCode.GetHashCode();
return hash;
}
}
}
}


编辑:

因此,在建议在相等比较器中使用委托的建议的帮助下,以及我在动态相等比较器中未正确计算哈希码的观点下,我能够得出以下结论:


反射确实增加了开销,但是我的LINQ ExceptIntersect表现不佳的问题是由于动态相等比较器以及我在属性而不是属性值上使用 GetHasCode()计算哈希码的事实。
使用委托相等会带来性能提升,用法语法仍然简洁明了。


我现在实现了下面的相等比较器:

public static class Compare
{
public static IEqualityComparer<TSource> By<TSource, TIdentity>(Func<TSource, TIdentity> identitySelector)
{
return new DelegateComparer<TSource, TIdentity>(identitySelector);
}

public static IEnumerable<T> IntersectBy<T, TIdentity>(this IEnumerable<T> source, IEnumerable<T> second, Func<T, TIdentity> identitySelector)
{
return source.Intersect(second, By(identitySelector));
}

private class DelegateComparer<T, TIdentity> : IEqualityComparer<T>
{
private readonly Func<T, TIdentity> identitySelector;

public DelegateComparer(Func<T, TIdentity> identitySelector)
{
this.identitySelector = identitySelector;
}

public bool Equals(T x, T y)
{
return Equals(identitySelector(x), identitySelector(y));
}

public int GetHashCode(T obj)
{
return identitySelector(obj).GetHashCode();
}
}
}


与以下用法语法很好地配合使用:

        var intersectingEmployeesDelegate = firstEmployeeList
.IntersectBy(secondEmployeeList, x => x.ReferenceCode).ToList();


我剩下的唯一悬而未决的问题是,是否有一种巧妙的方法可以在给定类型的所有属性上进行此比较。

我的最初实现类似于以下内容:

        foreach (var pInfo in typeof(Employee).GetProperties())
{
var intersectingEmployees = firstEmployeeList
.Intersect(secondEmployeeList,
new EmployeeComparerDynamic(pInfo.Name)).ToList();
}


有什么想法可以使用委托比较器实现?

最佳答案

使用反射获取所有属性时,必须使用usr建议的解决方案。您必须将表达式树构造并编译为委托,并将其用作比较器的构造函数的参数。该代码可能类似于:

public static IEqualityComparer<T> GetComparer<T>(PropertyInfo propertyInfo)
{
Type tT = typeof(T);
ParameterExpression paramExpr = Expression.Parameter(tT);
MemberExpression memberExpr = Expression.Property(paramExpr, propertyInfo);
LambdaExpression lambdaExpr = Expression.Lambda(memberExpr, paramExpr);

Type tQ = memberExpr.Type;
Type te = typeof(DelegateEqualityComparer<,>);
Type te2 = te.MakeGenericType(new Type[] { tT, tQ });
ConstructorInfo ci = te2.GetConstructors()[0];

Object i = ci.Invoke(new object[] { lambdaExpr.Compile() });

return (IEqualityComparer<T>)i;
}

关于c# - 使用反射和Linq Except/Intersect比较对象集合的性能较差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48612365/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com