gpt4 book ai didi

C# Parallel.For 和非初始化数组

转载 作者:太空宇宙 更新时间:2023-11-03 20:54:54 25 4
gpt4 key购买 nike

场景是这样的:在一个Parallel.For里面一个数组被用在一个非parallel for中。数组的所有元素都被覆盖,因此在技术上没有必要分配和初始化它(据我从 C# 教程中推断,这总是在构造时发生):

float[] result = new float[16384];
System.Threading.Tasks.Parallel.For(0,16384,x =>
{
int[] histogram = new int[32768]; // allocation and initialization with all 0's, no?
for (int i = 0; i < histogram.Length; i++)
{
histogram[i] = some_func(); // each element in histogram[] is written anew
}
result[x] = do_something_with(histogram);
});

顺序代码中的解决方案很简单:将数组拉到外部 for 循环的前面:

float[] result = new float[16384];
int[] histogram = new int[32768]; // allocation and initialization with
for(x = 0; x < 16384; x++)
{
for (int i = 0; i < histogram.Length; i++)
{
histogram[i] = some_func();
}
restult[x] = do_something_with(histogram);
}

现在在外循环中既没有分配也没有徒劳的 0-ing 发生。然而,在并行版本中,这肯定是一个糟糕的举动,要么并行进程正在破坏彼此的直方图结果,要么 C# 足够聪明以锁定 histogram 从而关闭任何并行性。分配一个 histogram[16384,32768] 同样是一种浪费。我现在正在尝试的是

public static ParallelLoopResult For<TLocal>(
int fromInclusive,
int toExclusive,
Func<TLocal> localInit,
Func<int, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally
)

库构造(函数?),但由于这是我第一次尝试使用 C# 进行并行编程,所以我充满了疑问。以下是顺序案例的正确翻译吗?

float[] result = new float[16384];
System.Threading.Tasks.Parallel.For<short[]>(0, 16384,
() => new short[32768],
(x, loopState, histogram) =>
{
for (int i = 0; i < histogram.Length; i++)
{
histogram[i] = some_func();
}
result[x] = do_something_with(histogram);
return histogram;
}, (histogram) => { });

最佳答案

我不完全确定您的要求,但让我们看看起点:

public void Original()
{
float[] result = new float[16384];
System.Threading.Tasks.Parallel.For(0, 16384, x =>
{
int[] histogram = new int[32768]; // allocation and initialization with all 0's, no?
for (int i = 0; i < histogram.Length; i++)
{
histogram[i] = some_func(); // each element in histogram[] is written anew
}
result[x] = do_something_with(histogram);
});
}

内部循环生成一个 histogram而外循环需要一个 histogram并使用它在 Results 中生成单个值.

一种易于操作的解决方案是执行此处理 TPL-Dataflow ,这是 TPL 之上的抽象。要进行设置,我们需要一些 DTO 来通过数据流管道。

public class HistogramWithIndex
{
public HistogramWithIndex(IEnumerable<int> histogram, int index)
{
Histogram = histogram;
Index = index;
}
public IEnumerable<int> Histogram { get; }
public int Index { get; }
}

public class IndexWithHistogramSize
{
public IndexWithHistogramSize(int index, int histogramSize)
{
Index = index;
HistogramSize = histogramSize;
}
public int Index { get; }
public int HistogramSize { get; }
}

这些类代表处于不同处理阶段的数据。现在让我们看看管道。

public async Task Dataflow()
{
//Build our pipeline
var options = new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount,
//This is default but I want to point it out
EnsureOrdered = true
};
var buildHistorgramBlock = new TransformBlock<IndexWithHistogramSize, HistogramWithIndex>(inputData =>
{
var histogram = Enumerable.Range(0, inputData.HistogramSize).Select(_ => some_func());
return new HistogramWithIndex(histogram, inputData.Index);
}, options);
var doSomethingBlock = new TransformBlock<HistogramWithIndex, int>(x => do_something_with(x.Histogram.ToArray()), options);

var resultBlock1 = new ActionBlock<int>(x => Results1.Add(x), options);
//var resultBlock2 = new ActionBlock<int>(x => //insert into list with index, options);

//link the blocks
buildHistorgramBlock.LinkTo(doSomethingBlock, new DataflowLinkOptions() { PropagateCompletion = true });
doSomethingBlock.LinkTo(resultBlock1, new DataflowLinkOptions() { PropagateCompletion = true });

//Post data
var histogramSize = 32768;
foreach (var index in Enumerable.Range(0, 16384))
{
await buildHistorgramBlock.SendAsync(new IndexWithHistogramSize(index, histogramSize));
}

buildHistorgramBlock.Complete();
await resultBlock1.Completion;
}

由两个 TransformBLocks 组成的 block 和 ActionBlock形成一个链接的管道。这样做的好处是,可以很容易地更改并行度、每个 block 的有限容量以引入背压等等。

重要提示:TransformBlocks ,如果使用并行性,即 MDOP > 1,那么他们将按照收到的顺序输出他们的项目。这意味着如果他们按顺序进来,他们就会按顺序离开。您还可以使用 block 选项关闭排序 Ensure Ordering .如果您希望您的项目在没有/有特定排序的特定索引中,这就会发挥作用。

这可能看起来有点矫枉过正,可能适合您的项目。但我发现这非常灵活且易于维护。尤其是当您开始向处理链中添加步骤时,添加一个 block 比将另一个 for 循环环绕在所有内容上要干净得多。

这是 c&p 的其余样板代码

private ConcurrentBag<int> Results1 = new ConcurrentBag<int>();
private int some_func() => 1;
private int do_something_with(int[] i) => i.First();

关于C# Parallel.For 和非初始化数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51520017/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com