Can I utilise a GPU to accelerate a non graphics related operation in C# such as a parallel for loop?(我是否可以利用GPU来加速C#中与图形无关的操作，例如并行for循环？)-6ren

Can I utilise a GPU to accelerate a non graphics related operation in C# such as a parallel for loop?(我是否可以利用GPU来加速C#中与图形无关的操作，例如并行for循环？)

转载作者：bug小助手更新时间：2023-10-24 17:54:37

I have the following CRC calculation that is executed 12 times in parallel on different data sources.

我有以下CRC计算，在不同的数据源上并行执行12次。

Can I offload this to the GPU once the CPU thread count is exhausted, or is the GPU not suited for such tasks and it does not make sense to do such a calculation on the GPU ?

一旦CPU线程数用完，或者GPU不适合执行此类任务而在GPU上执行这样的计算没有意义，我是否可以将其卸载到GPU？

If this is the wrong place to ask this question, could you please suggest where it should be asked.

如果这不是问这个问题的地方，你能建议应该在哪里问吗？

private static readonly byte[] _crcLookup = new byte[1024];

public static uint CalculateCRC(byte[] data, uint lower, uint upper)
{
    uint offset = 0;
    uint addr = 0;

    var segment = data;

    uint crc = uint.MaxValue;
    addr = lower;
    while (addr <= upper)
    {
        crc = crc >> 8 ^ _crcLookup [(byte)(data[addr] ^ crc)];
        addr++;
    }

    crc = ~crc;
    return crc;
}

Parallel implementation

并行实现

var dataSegments = new ConcurrentBag<(byte[] data, uint lower, uint upper)>();

Parallel.ForEach(dataSegments, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, segment =>
{
    uint result = CalculateCRC(segment.data, segment.lower, segment.upper);
    // Do something with the result...
});

更多回答

The c# compiler will optimize the code into assembly language automatically. The compiler knows what code will run use GPU assembly code when needed. GPU is designed to do large numbers of multiplications. Your code is using shifts and adds which is better done with the micro shift/add instructions.

C#编译器会自动将代码优化为汇编语言。编译器知道在需要时使用GPU汇编代码将运行什么代码。GPU被设计成可以进行大量乘法运算。您的代码正在使用移位和加法，这更适合使用微移位/加法指令。

Does the C# JIT really offload to the GPU automatically? Is there any documentation about this? My goal was to use the GPU even if it was slower to simply get better utilisation of an otherwise dormant piece of silicon.

C#JIT真的会自动卸载到GPU上吗？有关于这方面的文件吗？我的目标是使用GPU，即使它速度较慢，只是为了更好地利用原本处于休眠状态的硅片。

No it will not. You need to use the appropriate library and use the respective patterns. But if you do, the compiler will produce cpu code and gpu code that is not the same instruction set.

不，不会的。您需要使用适当的库并使用相应的模式。但如果这样做，编译器将生成不同指令集的CPU代码和GPU代码。

Consider using built in classes such as learn.microsoft.com/en-us/dotnet/api/…

考虑使用诸如learn.microsoft.com/en-us/dotnet/api/…之类的内置类

I wrote a toy project for computations like that with OpenCL: github.com/tugrul512bit/Cekirdekler/wiki/Load-Balancing you can use your gpu directly within C# for simple math. Crc computation of yours looks a bit serial like Mandelbrot but if you need it independently for thousands of different data, thousands crc in flight, then it can be parallelized easily just like Mandelbrot set generation.

我用OpenCL写了一个类似的计算玩具项目：github.com/tugrul512bit/Cekirdekler/wiki/Load-Balancing你可以直接在C#中使用你的GPU来进行简单的数学计算。你的Crc计算看起来有点像Mandelbrot，但如果你需要它独立地处理成千上万的不同数据，成千上万的crc在飞行中，那么它可以很容易地并行化，就像Mandelbrot集生成一样。

优秀答案推荐

Can I offload this to the GPU once the CPU thread count is exhausted, or is the GPU not suited for such tasks and it does not make sense to do such a calculation on the GPU ?

Technically yes, but only with great difficulty. You cannot run arbitrary c# code on a GPU, so you would likely need to write the GPU code in some other language, with all the complexity that entails.

从技术上讲，是的，但难度很大。您不能在GPU上运行任意的C#代码，因此您可能需要用其他语言编写GPU代码，而这会带来所有的复杂性。

But chances are you will only see a performance reduction. CRC calculations should be IO limited if the code is decently well optimized. So the extra overhead to transfer data to the GPU would very likely cost more than it could benefit. Also, GPUs are designed for massive parallelism. In the while (addr <= upper)-loop you have a dependency between iterations, so it could not directly be parallelized. It might be possible to do some hierarchical version of CRC, but even then, the overhead would be prohibitive.

但您可能只会看到性能下降。如果代码进行了良好的优化，则CRC计算应该是IO受限的。因此，将数据传输到GPU的额外开销很可能会付出比收益更高的成本。此外，GPU专为大规模并行而设计。在WHILE(addr<=up)循环中，迭代之间存在依赖关系，因此不能直接并行化。做一些分级版本的CRC是可能的，但即使那样，开销也是令人望而却步的。

Parallelizing on the 12 different data sources should be done on different threads, not on the GPU. A CPU core is much faster than a "GPU core". You just have single/double digit CPU cores, but possibly thousands of GPU cores.

在12个不同数据源上的并行化应该在不同的线程上完成，而不是在GPU上。CPU核比“GPU核”快得多。你只有个位数或两位数的CPU核心，但可能有数千个GPU核心。

Here is the code:

以下是代码：

private static readonly byte[] _crcLookup = new byte[1024];

public static uint CalculateCRC(byte[] data, uint lower, uint upper)
{
    uint offset = 0;
    uint addr = 0;

    var segment = data;

    uint crc = uint.MaxValue;
    addr = lower;
    while (addr <= upper)
    {
        crc = crc >> 8 ^ _crcLookup [(byte)(data[addr] ^ crc)];
        addr++;
    }

    crc = ~crc;
    return crc;
}

var dataSegments = new ConcurrentBag<(byte[] data, uint lower, uint upper)>();

Parallel.ForEach(dataSegments, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, segment =>
{
    uint result = CalculateCRC(segment.data, segment.lower, segment.upper);
    // Do something with the result...
});

Using a GPU for non-graphics operations in C# like a CRC calculation might not be straightforward. While there are libraries like CUDA.NET that allow for GPU programming in C#, it's important to note that not all operations benefit from GPU parallelization. In some cases, the overhead of transferring data between CPU and GPU memory can outweigh the benefits gained from parallelization.

将GPU用于C#中的非图形操作，如CRC计算，可能并不简单。虽然有像CUDA.NET这样的库允许用C#进行GPU编程，但重要的是要注意，并不是所有的操作都能从GPU并行化中受益。在某些情况下，在CPU和GPU内存之间传输数据的开销可能会超过并行化带来的好处。

更多回答

I am failing to see where this is different from OP's code in the question ...

我看不出这与问题中OP的代码有什么不同……

文章推荐： Mask String with characters(带有字符的掩码字符串)

iphone - <加速/加速.h> "file not found"
我想在我的 iPhone 应用程序中加入线性回归。经过一些搜索，我发现 Accelerate Framework 中的 LAPACK 和 BLAS 是正确的库。但是我很难将加速框架添加到我的 XCod
Javascript 加速？
有什么方法可以加速 JS 脚本(我指的是一些复杂的 DOM 操作，比如游戏或动画)？最佳答案真的没有办法真正加快速度。您可以压缩它，但不会快很多。关于Javascript 加速？，我们在Stac
MySQL加载数据infile - 加速？
有时，我必须为一个项目重新导入数据，从而将大约 360 万行读入 MySQL 表(目前是 InnoDB，但我实际上并不局限于这个引擎)。 “加载数据文件...”已被证明是最快的解决方案，但它有一个权衡
performance - 如何计算执行时间(加速)
在尝试计算加速时，我被卡住了。所以给出的问题是: 问题 1 如果程序的 50% 增强了 2 倍，其余 50% 增强了 4 倍，那么由于增强而导致的整体加速是多少？ Hints:考虑增强前(未增强)机器
python - 加速 Matplotlib
目前我正在处理实时绘图，但可视化非常慢。我想知道你可以做些什么来加速 Matplotlib 中的事情: 后端如何影响性能？是否有后端实时绘图比其他人更好吗？我可以降低分辨率以提高 FPS 吗？如
haskell - 加速 runhaskell
我有一个小型测试框架。它执行一个循环，执行以下操作: 生成一个小的 Haskell 源文件。使用 runhaskell 执行此操作.该程序生成各种磁盘文件。处理刚刚生成的磁盘文件。这种情况发生了
javascript - 加速 swfobject
这是我的网站:Instant-YouTube 如您所见，加载需要很长时间。在 IE8 及以下甚至有时会导致浏览器崩溃。我不确定是什么原因造成的。可能是 Clicksor 广告，但我认为是 swfobj
ios - 加速 SKSpriteNode
是否可以加速 SKSpriteNode？我知道可以使用 node.physicsBody.velocity 轻松设置速度但是设置它的加速度有多难？最佳答案从牛顿第二定律倒推运动:F = m.a您
javascript - 加速 FCKEditor
有没有人有加速 FCKEditor 的技术？是否有一些关键的 JavaScript 文件可以缩小或删除？最佳答案在最新版本 (3.0.1) 中，FCKEditor 已重命名为 CKEditor .
MySQL查询优化-加速|索引使用
我有以下 MySQL 查询，需要一天多的时间才能执行: SELECT SN,NUMBER FROM a WHERE SN IN (SELECT LOWER_SN FROM b WHER
ios - 加速、移动元素
我现在正在开发一款使用加速来玩的游戏。我找到了如何让我的元素移动，但不改变它的“原点”，或者更准确地说，改变加速度计算的原点: 事实上，我的图像是移动的，它的中心是这样定义的: imageView.c
mysql - 加速 ORDER BY
我有一个 mysql 表，其中存储有 4 列的成员消息: message_id(主键，自增) sender_id( key ) receiver_id( key ) 消息内容我做了很多 SELECT
用于简单计算的 CUDA 加速
我在 cuda_computation.cu 中有以下代码 #include #include #include #include void checkCUDAError(const char
python - 加速 BeautifulSoup
我正在使用 BeautifulSoup 在 for 循环中解析数千个网站。这是我的代码片段: def parse_decision(link): t1 = time.time() de
c++ - 加速 OpenCV
我正在使用 OpenCV 2.4 (C++) 在灰度图像上进行寻线。这涉及一些基本的图像处理步骤，如模糊、阈值、Canny 边缘检测器、梯度滤波器或霍夫变换。我必须在数千张图像上应用寻线算法。考虑到
java - 加速 jasperreports
当我试图连续生成四次相同的报告时，我刚刚分析了我的报告应用程序。第一个用了 1859 毫秒，而后面的只用了 400 到 600 毫秒。对此的解释是什么？我能以某种方式使用它来使我的应用程序更快吗？报告
ios - 加速 Storyboard打开
当我打开 Storyboard文件时，由于其中包含的 VC 数量，打开它需要 1-2 分钟。加快速度的最佳做法是什么？我们应该将一些 VC 移动到不同的 Storyboard文件中吗？我们是否应该使用
iphone - 加速 UIPageViewController
我有一个包含多个页面的 UIPageViewController。每个页面都是相同的 View Controller ，但会跟踪页码并显示 PDF 的正确页面。问题是每个 PDF 页面都需要在 cur
java - 加速 Java
这实际上是两个问题，但它们非常相似，为了简单起见，我想将它们放在一起: 首先:给定一个已建立的 Java 项目，除了简单的代码内优化之外，还有哪些不错的方法可以加快它的速度？其次:在用Java从头写
java - 加速 xpath
我有一个包含 1000 个条目的文档，其格式类似于:

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Can I utilise a GPU to accelerate a non graphics related operation in C# such as a parallel for loop?(我是否可以利用GPU来加速C#中与图形无关的操作，例如并行for循环？)