gcc - 由于 OpenMP : how to bind threads to cores 的超线程导致性能不佳-6ren

gcc - 由于 OpenMP : how to bind threads to cores 的超线程导致性能不佳

转载作者：行者123 更新时间：2023-12-03 20:57:26

25

4

我正在开发大型密集矩阵乘法代码。当我分析代码时，它有时会得到我四核系统峰值失败率的 75% 左右，而其他时候会得到大约 36%。执行代码之间的效率不会改变。它要么从 75% 开始并以该效率继续，要么从 36% 开始并以该效率继续。

我已将问题追溯到超线程以及我将线程数设置为四个而不是默认的八个这一事实。 当我在 BIOS 中禁用超线程时，我的效率始终保持在 75% 左右(或者至少我从未看到大幅下降到 36%)。

在调用任何并行代码之前，我会执行 omp_set_num_threads(4) .我也试过export OMP_NUM_THREADS=4在我运行我的代码之前，但它似乎是等价的。

我不想在 BIOS 中禁用超线程。我想我需要将四个线程绑定(bind)到四个核心。我已经测试了 GOMP_CPU_AFFINITY 的一些不同案例但到目前为止我仍然有效率有时为36％的问题。 超线程和内核的映射是什么？ 例如。线程 0 和线程 1 是否对应同一个核心，线程 2 和线程 3 是否对应另一个核心？

如何在没有线程迁移的情况下将线程绑定(bind)到每个内核，这样我就不必在 BIOS 中禁用超线程？ 也许我需要考虑使用 sched_setaffinity ?

我当前系统的一些细节:Linux 内核 3.13，GCC 4.8，Intel Xeon E5-1620(四个物理内核，八个超线程)。

编辑:
到目前为止，这似乎运作良好

export GOMP_CPU_AFFINITY="0 1 2 3 4 5 6 7"

或者

export GOMP_CPU_AFFINITY="0-7"

编辑:
这似乎也很好用

export OMP_PROC_BIND=true

编辑:
These options也很好用(gemm 是我的可执行文件的名称)

numactl -C 0,1,2,3 ./gemm

和

taskset -c 0,1,2,3 ./gemm

最佳答案

这不是您问题的直接答案，但可能值得研究:apparently, hyperthreading can cause your cache to thrash .您是否尝试过检查 valgrind 以查看导致您的问题的问题？在每个线程的堆栈顶部分配一些垃圾可能会有一个快速修复，这样您的线程就不会最终将彼此的缓存行踢出。

It looks like your CPU is 4-way set associative因此，认为跨 8 个线程，您最终可能会得到一些非常不幸的对齐访问，这并不疯狂。如果您的矩阵在缓存大小的倍数上对齐，并且如果您有成对的线程访问区域的缓存倍数分开，则第三个线程的任何偶然读取都足以开始导致冲突未命中。

快速测试——如果您将输入矩阵更改为不是缓存大小的倍数(因此它们不再在边界上对齐)并且您的问题消失了，那么您很有可能正在处理冲突未命中。

关于gcc - 由于 OpenMP : how to bind threads to cores 的超线程导致性能不佳，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24368576/

25

4

0

文章推荐： wkhtmltopdf - wkhtmltopdf可变DPI和页面大小(以像素为单位)

文章推荐： android - 使用 Google 的分页库更新 PagedList 中的单个项目

文章推荐： playframework-2.0 - 带有 Play 框架的 Browserify/CommonJS

C#基础多线程问题: Call Method on Thread A from Thread B (Thread B started from Thread A)
完成此任务的最佳方法是什么:主线程(线程 A)创建另外两个线程(线程 B 和线程 C)。线程 B 和 C 执行繁重的磁盘 I/O，最终需要将它们创建的资源传递给线程 A，然后调用外部 DLL 文件中的
multithreading - Threads.@spawn 和 Threads.@threads 有什么区别？
我是一名对 Julia 语言感兴趣的新手程序员。文档( https://docs.julialang.org/en/v1/base/multi-threading/ )说 Threads.@threa
python - thread.start_new_thread 与 threading.Thread.start
python中的thread.start_new_thread和threading.Thread.start有什么区别？我注意到，当调用 start_new_thread 时，新线程会在调用线程终止
安卓蓝牙 : A thread started from UI thread blocks the UI thread
我正在学习安卓蓝牙编程。我从 Google 的 Android 开发者网站上复制了大部分代码以供学习。这个想法是监听服务器上的连接是在一个新线程中完成的，而不会阻塞 UI 线程。当收到连接请求时，连接
Java多线程: Does the thread on which an objects method is executed depend on the thread on the thread in which it is created?
执行对象方法的线程是否依赖于创建它的线程上的线程？假设您的 java 应用程序中有两个线程 Thread1 和 Thread2，以及两个类 ClassA 和 ClassB。您在 Thread1 上
C++11 std::thread 给出错误:没有匹配的函数来调用 std::thread::thread
我正在用这段代码测试 C++11 线程，但是在创建线程时，我遇到了错误没有匹配函数调用 'std::thread::thread()'. 这就像我给 std::thread ctr 的函数有什么问题，
c++ - 使用已删除的函数 'std::thread::thread(const std::thread&)'
我有如下类 eventEngine 和网关: class eventEngine { public: eventEngine(); std::thread threa; std
python - "RuntimeError: thread.__init__() not called"子类化 threading.Thread 时
我需要运行与列表 dirlist 中的元素一样多的 Observer 类线程。当我运行它 python 控制台时，它可以正常工作。 class Observer(Thread): def ru
java - 在对 Thread.currentThread(); 的方法调用中，Thread 指的是什么？和 Thread.sleep();？
我在一本 Java 书中读到了下面的代码。我知道主类默认继承 Thread 类，所以 currentThread();而不是 Thread.currentThread();也会做这项工作。但我不明白
java - 守护线程 : Is it possible to change a running thread from user thread to daemon thread?
我在我的系统中使用第 3 方 API，该 API 启动一个永久运行的用户线程。一旦我的程序结束，JVM 由于该线程而继续运行，因此我尝试获取此线程引用并通过更改它 thread.setDaemon(t
python - 为什么 super(Thread, self).__init__() 不能用于 threading.Thread 子类？
我所知道的 Python 中的每个对象都可以通过调用来处理其基类初始化: super(BaseClass, self).__init__() threading.Thread 的子类似乎不是这种情况，
c# - Xamarin - Java.Lang.Thread 与 System.Threading.Thread - 使用哪一个？
在我最近从事的 Xamarin 项目中，我可以看到开发人员使用了 Java.Lang.Thread 以及 System.Threading.Thread(用于非常相似的操作 - 例如在后台加载数据)。
Julia Threads.@threads 在一个简单的例子中不起作用
我在 Julia 中运行双循环。代码非常简单。 w = rand(1000,1000) function regular_demo(w::Array{Float64, 2}) n = size
multithreading - 将参数传递给 threading.Thread
我在 Windows 上使用 Python 3。我正在使用 threading.Thread动态运行一个函数，我可以带参数或不带参数调用它。我正在设置一个列表，其中的第一项是定义路径的字符串。其他参数
python - threading.Thread 中的流控制
我遇到了一些使用线程模块(使用 Python 2.6)管理线程的示例。我想了解的是这个例子是如何调用“运行”方法的，在哪里调用的。我在任何地方都看不到它。 ThreadUrl 类在 main() 函
Python threading.Thread、范围和垃圾收集
假设我从 threading.Thread 派生: from threading import Thread class Worker(Thread): def start(self):
python - 'threading' 对象没有属性 'Thread'
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and th
.net - WinDbg/SOS : How to correlate managed threads from ! 带有 System.Threading.Thread 实例的线程命令
使用 WinDbg 和 SOS，我有以下内容: 0:011> !threads ThreadCount: 7 UnstartedThread: 0 BackgroundThread: 4 Pendin
java - 谷歌应用引擎错误 : Fetch in a thread that is neither the original request thread nor a thread created by ThreadManager
App Engine 给出错误: com.google.apphosting.api.ApiProxy$CallNotFoundException: Can't make API call urlfe
java - "Thread-19"java.lang.IllegalStateException : Not on FX application thread; currentThread = Thread-19
我正在尝试将 Swing JEditorPane 嵌入到 JavaFX 项目中，如下代码所示。 Platform.runLater(() -> { SyntaxTester ob = new

首页

博学

6Ren·AI

商城

gcc - 由于 OpenMP : how to bind threads to cores 的超线程导致性能不佳