gpt4 book ai didi

c++-amp - 使用 C++-AMP 进行面向对象编程

转载 作者:行者123 更新时间:2023-12-04 04:35:54 24 4
gpt4 key购买 nike

我需要更新一些用于 Aho-Corasick 算法的代码,以便使用 GPU 实现该算法。但是,代码严重依赖于面向对象的编程模型。我的问题是,是否可以将对象传递给 parallel for each ?如果不;有什么办法可以使我免于再次重写整个代码。如果这似乎是幼稚的问题,我深表歉意。 C++-AMP 是我在 GPU 编程中使用的第一种语言。因此,我在这个领域的经验非常有限。

最佳答案

您的问题的答案是肯定的,因为您可以将类或结构传递给标记为 restrict(amp) 的 lambda。 .请注意 parallel_for each` 不受 AMP 限制,它的 lambda 是。

但是,您只能使用 GPU 支持的类型。这更多是当前 GPU 硬件的限制,而不是 C++ AMP。

A C++ AMP-compatible function or lambda can only use C++ AMP-compatible types, which include the following:

  • int
  • unsigned int
  • float
  • double
  • C-style arrays of int, unsigned int, float, or double
  • concurrency::array_view or references to concurrency::array
  • structs containing only C++ AMP-compatible types

This means that some data types are forbidden:

  • bool (can be used for local variables in the lambda)
  • char
  • short
  • long long
  • unsigned versions of the above

References and pointers (to a compatible type) may be used locally but cannot be captured by a lambda. Function pointers, pointer-to-pointer, and the like are not allowed; neither are static or global variables. Classes must meet more rules if you wish to use instances of them. They must have no virtual func- tions or virtual inheritance. Constructors, destructors, and other nonvirtual functions are allowed. The member variables must all be of compatible types, which could of course include instances of other classes as long as those classes meet the same rules.

... From the C++ AMP book, Ch, 3.



因此,尽管您可以这样做,但出于性能原因,它可能不是最佳解决方案。 CPU 和 GPU 缓存有些不同。这使得结构数组成为 CPU 实现的更好选择,而如果使用数组结构,GPU 通常性能更好。

GPU hardware is designed to provide the best performance when all threads within a warp are access- ing consecutive memory and performing the same operations on that data. Consequently, it should come as no surprise that GPU memory is designed to be most efficient when accessed in this way. In fact, load and store operations to the same transfer line by different threads in a warp are coalesced into as little as a single transaction. The size of a transfer line is hardware-dependent, but in general, your code does not have to account for this if you focus on making memory accesses as contiguous as possible.

... Ch. 7.



如果你看一下我的 n-body example 的 CPU 和 GPU 实现您将看到 CPU 和 GPU 两种方法的实现。

以上并不意味着当您将实现移至 C++ AMP 时您的算法不会运行得更快。这只是意味着您可能会在桌面上留下一些额外的性能。我建议尽可能做最简单的移植,然后考虑是否要投入更多时间优化代码,可能重写它以更好地利用 GPU 的架构。

关于c++-amp - 使用 C++-AMP 进行面向对象编程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19762287/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com