gpt4 book ai didi

Can I utilise a GPU to accelerate a non graphics related operation in C# such as a parallel for loop?(我是否可以利用GPU来加速C#中与图形无关的操作,例如并行for循环?)

转载 作者:bug小助手 更新时间:2023-10-24 17:54:37 26 4
gpt4 key购买 nike



I have the following CRC calculation that is executed 12 times in parallel on different data sources.

我有以下CRC计算,在不同的数据源上并行执行12次。


Can I offload this to the GPU once the CPU thread count is exhausted, or is the GPU not suited for such tasks and it does not make sense to do such a calculation on the GPU ?

一旦CPU线程数用完,或者GPU不适合执行此类任务而在GPU上执行这样的计算没有意义,我是否可以将其卸载到GPU?


If this is the wrong place to ask this question, could you please suggest where it should be asked.

如果这不是问这个问题的地方,你能建议应该在哪里问吗?


private static readonly byte[] _crcLookup = new byte[1024];

public static uint CalculateCRC(byte[] data, uint lower, uint upper)
{
uint offset = 0;
uint addr = 0;

var segment = data;

uint crc = uint.MaxValue;
addr = lower;
while (addr <= upper)
{
crc = crc >> 8 ^ _crcLookup [(byte)(data[addr] ^ crc)];
addr++;
}

crc = ~crc;
return crc;
}

Parallel implementation

并行实现


var dataSegments = new ConcurrentBag<(byte[] data, uint lower, uint upper)>();

Parallel.ForEach(dataSegments, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, segment =>
{
uint result = CalculateCRC(segment.data, segment.lower, segment.upper);
// Do something with the result...
});

更多回答

The c# compiler will optimize the code into assembly language automatically. The compiler knows what code will run use GPU assembly code when needed. GPU is designed to do large numbers of multiplications. Your code is using shifts and adds which is better done with the micro shift/add instructions.

C#编译器会自动将代码优化为汇编语言。编译器知道在需要时使用GPU汇编代码将运行什么代码。GPU被设计成可以进行大量乘法运算。您的代码正在使用移位和加法,这更适合使用微移位/加法指令。

Does the C# JIT really offload to the GPU automatically? Is there any documentation about this? My goal was to use the GPU even if it was slower to simply get better utilisation of an otherwise dormant piece of silicon.

C#JIT真的会自动卸载到GPU上吗?有关于这方面的文件吗?我的目标是使用GPU,即使它速度较慢,只是为了更好地利用原本处于休眠状态的硅片。

No it will not. You need to use the appropriate library and use the respective patterns. But if you do, the compiler will produce cpu code and gpu code that is not the same instruction set.

不,不会的。您需要使用适当的库并使用相应的模式。但如果这样做,编译器将生成不同指令集的CPU代码和GPU代码。

Consider using built in classes such as learn.microsoft.com/en-us/dotnet/api/…

考虑使用诸如learn.microsoft.com/en-us/dotnet/api/…之类的内置类

I wrote a toy project for computations like that with OpenCL: github.com/tugrul512bit/Cekirdekler/wiki/Load-Balancing you can use your gpu directly within C# for simple math. Crc computation of yours looks a bit serial like Mandelbrot but if you need it independently for thousands of different data, thousands crc in flight, then it can be parallelized easily just like Mandelbrot set generation.

我用OpenCL写了一个类似的计算玩具项目:github.com/tugrul512bit/Cekirdekler/wiki/Load-Balancing你可以直接在C#中使用你的GPU来进行简单的数学计算。你的Crc计算看起来有点像Mandelbrot,但如果你需要它独立地处理成千上万的不同数据,成千上万的crc在飞行中,那么它可以很容易地并行化,就像Mandelbrot集生成一样。

优秀答案推荐


Can I offload this to the GPU once the CPU thread count is exhausted, or is the GPU not suited for such tasks and it does not make sense to do such a calculation on the GPU ?



Technically yes, but only with great difficulty. You cannot run arbitrary c# code on a GPU, so you would likely need to write the GPU code in some other language, with all the complexity that entails.

从技术上讲,是的,但难度很大。您不能在GPU上运行任意的C#代码,因此您可能需要用其他语言编写GPU代码,而这会带来所有的复杂性。


But chances are you will only see a performance reduction. CRC calculations should be IO limited if the code is decently well optimized. So the extra overhead to transfer data to the GPU would very likely cost more than it could benefit. Also, GPUs are designed for massive parallelism. In the while (addr <= upper)-loop you have a dependency between iterations, so it could not directly be parallelized. It might be possible to do some hierarchical version of CRC, but even then, the overhead would be prohibitive.

但您可能只会看到性能下降。如果代码进行了良好的优化,则CRC计算应该是IO受限的。因此,将数据传输到GPU的额外开销很可能会付出比收益更高的成本。此外,GPU专为大规模并行而设计。在WHILE(addr<=up)循环中,迭代之间存在依赖关系,因此不能直接并行化。做一些分级版本的CRC是可能的,但即使那样,开销也是令人望而却步的。


Parallelizing on the 12 different data sources should be done on different threads, not on the GPU. A CPU core is much faster than a "GPU core". You just have single/double digit CPU cores, but possibly thousands of GPU cores.

在12个不同数据源上的并行化应该在不同的线程上完成,而不是在GPU上。CPU核比“GPU核”快得多。你只有个位数或两位数的CPU核心,但可能有数千个GPU核心。



Here is the code:

以下是代码:


private static readonly byte[] _crcLookup = new byte[1024];

public static uint CalculateCRC(byte[] data, uint lower, uint upper)
{
uint offset = 0;
uint addr = 0;

var segment = data;

uint crc = uint.MaxValue;
addr = lower;
while (addr <= upper)
{
crc = crc >> 8 ^ _crcLookup [(byte)(data[addr] ^ crc)];
addr++;
}

crc = ~crc;
return crc;
}

var dataSegments = new ConcurrentBag<(byte[] data, uint lower, uint upper)>();

Parallel.ForEach(dataSegments, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, segment =>
{
uint result = CalculateCRC(segment.data, segment.lower, segment.upper);
// Do something with the result...
});

Using a GPU for non-graphics operations in C# like a CRC calculation might not be straightforward. While there are libraries like CUDA.NET that allow for GPU programming in C#, it's important to note that not all operations benefit from GPU parallelization. In some cases, the overhead of transferring data between CPU and GPU memory can outweigh the benefits gained from parallelization.

将GPU用于C#中的非图形操作,如CRC计算,可能并不简单。虽然有像CUDA.NET这样的库允许用C#进行GPU编程,但重要的是要注意,并不是所有的操作都能从GPU并行化中受益。在某些情况下,在CPU和GPU内存之间传输数据的开销可能会超过并行化带来的好处。


更多回答

I am failing to see where this is different from OP's code in the question ...

我看不出这与问题中OP的代码有什么不同……

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com