gpt4 book ai didi

c# - 选择 Distinct Count 真的很慢

转载 作者:行者123 更新时间:2023-11-30 15:22:30 25 4
gpt4 key购买 nike

我有一个包含大约 7000 个对象的循环,在循环中我需要获得结构列表的不同计数。目前我正在使用 -

foreach (var product in productsToSearch)
{
Console.WriteLine("Time elapsed: {0} start", stopwatch.Elapsed);
var cumulativeCount = 0;
productStore.Add(product);
var orderLinesList = totalOrderLines
.Where(myRows => productStore.Contains(myRows.Sku))
.Select(myRows => new OrderLineStruct
{
OrderId = myRows.OrderId,
Sku = myRows.Sku
});
var differences = totalOrderLines.Except(orderLinesList);
cumulativeCount = totalOrderLinsCount - differences.Select(x => x.OrderId).Distinct().Count();
cumulativeStoreTable.Rows.Add(product, cumulativeCount);
Console.WriteLine("Time elapsed: {0} end", stopwatch.Elapsed);
}

public struct OrderLineStruct
{
public string OrderId { get; set; }
public string Sku { get; set; }
}

这在获取非重复计数时非常慢。有人知道这样做的更有效方法吗?我试过使用 MoreLinq,它有一个用于 Linq 的 DisctintBy 方法,但它并没有像我计时的那样更有效率。我玩过 PLinq,但我有点不确定在哪里并行化此查询。

所以循环的每次迭代都在 -
耗时:00:00:37.1142047 开始
已用时间:00:00:37.8310148 结束

= 0.7168101 秒* 7000 = 5017.6707(83.627845 分钟)

处理时间最长的是 Distinct() Count() 行(大约 0.5 秒)。变量差异有几十万个 OrderLineStruct,因此对此进行任何 linq 查询都很慢。

更新

我对循环做了一些修改,现在它运行大约 10 分钟,而不是 1 小时以上

foreach (var product in productsToSearch)
{
var cumulativeCount = 0;
productStore.Add(product);
var orderLinesList = totalOrderLines
.Join(productStore, myRows => myRows.Sku, p => p, (myRows, p) => myRows)
.Select(myRows => new OrderLineStruct
{
OrderId = myRows.OrderId,
Sku = myRows.Sku
});
totalOrderLines = totalOrderLines.Except(orderLinesList).ToList();
cumulativeCount = totalOrderLinesCount - totalOrderLines.Select(x => x.OrderId).Distinct().Count();
cumulativeStoreTable.Rows.Add(product, cumulativeCount);
}

在 Except 上有一个 .ToList() 似乎有所不同,现在我在每次迭代后删除已经处理的订单,这会提高每次迭代的性能。

最佳答案

你在错误的地方寻找问题。

orderLinesListdifferencesdifferences.Select(x => x.OrderId).Distinct() 只是 LINQ to Objects chained具有延迟执行查询 方法,Count() 方法正在执行它们。

您的处理算法效率极低。瓶颈是 orderLinesList 查询,它为每个 product 迭代整个 totalOrderLines 列表,并且链接(包含)在 Except Distinct 等 - 同样,在循环内,即 7000 多次。

这是 IMO 做的相同的示例高效算法:

Console.WriteLine("Time elapsed: {0} start", stopwatch.Elapsed);
var productInfo =
(
from product in productsToSearch
join line in totalOrderLines on product equals line.Sku into orderLines
select new { Product = product, OrderLines = orderLines }
).ToList();
var lastIndexByOrderId = new Dictionary<string, int>();
for (int i = 0; i < productInfo.Count; i++)
{
foreach (var line in productInfo[i].OrderLines)
lastIndexByOrderId[line.OrderId] = i; // Last wins
}
int cumulativeCount = 0;
for (int i = 0; i < productInfo.Count; i++)
{
var product = productInfo[i].Product;
foreach (var line in productInfo[i].OrderLines)
{
int lastIndex;
if (lastIndexByOrderId.TryGetValue(line.OrderId, out lastIndex) && lastIndex == i)
{
cumulativeCount++;
lastIndexByOrderId.Remove(line.OrderId);
}
}
cumulativeStoreTable.Rows.Add(item.Product, cumulativeCount);
// Remove the next if it was just to support your processing
productStore.Add(item.Product);
}
Console.WriteLine("Time elapsed: {0} end", stopwatch.Elapsed);

关于c# - 选择 Distinct Count 真的很慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35581479/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com