How to abstract typical memory behavior from a complex multi-threaded workload?(如何从复杂的多线程工作负载中抽象出典型的内存行为？)-6ren

How to abstract typical memory behavior from a complex multi-threaded workload?(如何从复杂的多线程工作负载中抽象出典型的内存行为？)

转载作者：bug小助手更新时间：2023-10-25 19:44:19

31

4

There is a complex multi-threaded program running in my system, which performs various types of operations. At the same time, I cannot access its source code.

我的系统中运行着一个复杂的多线程程序，它执行各种类型的操作。同时，我无法访问它的源代码。

I want to analyze the potential impact of its memory access behavior on performance for further optimization.

我想分析它的内存访问行为对性能的潜在影响，以便进一步优化。

Is there a way to identify some typical memory access behavior and their proportion? For example, continuous memory reading and writing, linked list traversal, data sharing between threads, etc.

有没有办法确定一些典型的内存访问行为及其比例？例如，连续的内存读写、链表遍历、线程间的数据共享等。

Intel's pin tool can generate load/store trace, can it be used for the above analysis? If possible, what should be done?

英特尔的引脚工具可以生成加载/存储跟踪，它是否可以用于上述分析？如果可能的话，应该做些什么呢？

Are there other tools or methodologies that I can refer to?

是否有其他工具或方法可供我参考？

更多回答

If you're looking for "bad" patterns like linked-list or tree traversal, you'd be looking for loads whose address depends on another recent load (not including loads from locals on the stack, like [rsp+const] addressing modes, unless store/reload is part of a chain). IDK if PIN can help detect such patterns.

如果您正在寻找像链表或树遍历这样的“坏”模式，那么您将寻找其地址依赖于另一个最近加载的加载(不包括来自堆栈上的本地加载的加载，如[RSP+Const]寻址模式，除非存储/重新加载是链的一部分)。如果PIN可以帮助检测此类模式，请确认。

Other than that, the main things I'd look for would be loads that tend to miss in cache, especially ones that end up stalling execution. e.g. on Intel (such as Skylake), there's an event cycle_activity.stalls_l3_miss which might be one to look for: Execution stalls while L3 cache miss demand load is outstanding. An "execution stall" is when no uops are dispatched from the scheduler to an execution unit that cycle. There are lots of other events for cache misses in various levels, and lines in/out.

除此之外，我要寻找的主要内容是缓存中往往未命中的加载，特别是那些最终导致执行停滞的加载。例如，在英特尔(如Skylake)上，有一个事件Cycle_Activity.stalls_L3_Misse可能是要查找的：当L3缓存未命中请求加载未完成时，执行暂停。“执行停顿”是指没有微指令从调度器调度到该周期的执行单元。对于不同级别的高速缓存未命中以及行输入/输出，还有许多其他事件。

优秀答案推荐

更多回答

31

4

0

C#基础多线程问题: Call Method on Thread A from Thread B (Thread B started from Thread A)
完成此任务的最佳方法是什么:主线程(线程 A)创建另外两个线程(线程 B 和线程 C)。线程 B 和 C 执行繁重的磁盘 I/O，最终需要将它们创建的资源传递给线程 A，然后调用外部 DLL 文件中的
multithreading - Threads.@spawn 和 Threads.@threads 有什么区别？
我是一名对 Julia 语言感兴趣的新手程序员。文档( https://docs.julialang.org/en/v1/base/multi-threading/ )说 Threads.@threa
python - thread.start_new_thread 与 threading.Thread.start
python中的thread.start_new_thread和threading.Thread.start有什么区别？我注意到，当调用 start_new_thread 时，新线程会在调用线程终止
安卓蓝牙 : A thread started from UI thread blocks the UI thread
我正在学习安卓蓝牙编程。我从 Google 的 Android 开发者网站上复制了大部分代码以供学习。这个想法是监听服务器上的连接是在一个新线程中完成的，而不会阻塞 UI 线程。当收到连接请求时，连接
Java多线程: Does the thread on which an objects method is executed depend on the thread on the thread in which it is created?
执行对象方法的线程是否依赖于创建它的线程上的线程？假设您的 java 应用程序中有两个线程 Thread1 和 Thread2，以及两个类 ClassA 和 ClassB。您在 Thread1 上
C++11 std::thread 给出错误:没有匹配的函数来调用 std::thread::thread
我正在用这段代码测试 C++11 线程，但是在创建线程时，我遇到了错误没有匹配函数调用 'std::thread::thread()'. 这就像我给 std::thread ctr 的函数有什么问题，
c++ - 使用已删除的函数 'std::thread::thread(const std::thread&)'
我有如下类 eventEngine 和网关: class eventEngine { public: eventEngine(); std::thread threa; std
python - "RuntimeError: thread.__init__() not called"子类化 threading.Thread 时
我需要运行与列表 dirlist 中的元素一样多的 Observer 类线程。当我运行它 python 控制台时，它可以正常工作。 class Observer(Thread): def ru
java - 在对 Thread.currentThread(); 的方法调用中，Thread 指的是什么？和 Thread.sleep();？
我在一本 Java 书中读到了下面的代码。我知道主类默认继承 Thread 类，所以 currentThread();而不是 Thread.currentThread();也会做这项工作。但我不明白
java - 守护线程 : Is it possible to change a running thread from user thread to daemon thread?
我在我的系统中使用第 3 方 API，该 API 启动一个永久运行的用户线程。一旦我的程序结束，JVM 由于该线程而继续运行，因此我尝试获取此线程引用并通过更改它 thread.setDaemon(t
python - 为什么 super(Thread, self).__init__() 不能用于 threading.Thread 子类？
我所知道的 Python 中的每个对象都可以通过调用来处理其基类初始化: super(BaseClass, self).__init__() threading.Thread 的子类似乎不是这种情况，
c# - Xamarin - Java.Lang.Thread 与 System.Threading.Thread - 使用哪一个？
在我最近从事的 Xamarin 项目中，我可以看到开发人员使用了 Java.Lang.Thread 以及 System.Threading.Thread(用于非常相似的操作 - 例如在后台加载数据)。
Julia Threads.@threads 在一个简单的例子中不起作用
我在 Julia 中运行双循环。代码非常简单。 w = rand(1000,1000) function regular_demo(w::Array{Float64, 2}) n = size
multithreading - 将参数传递给 threading.Thread
我在 Windows 上使用 Python 3。我正在使用 threading.Thread动态运行一个函数，我可以带参数或不带参数调用它。我正在设置一个列表，其中的第一项是定义路径的字符串。其他参数
python - threading.Thread 中的流控制
我遇到了一些使用线程模块(使用 Python 2.6)管理线程的示例。我想了解的是这个例子是如何调用“运行”方法的，在哪里调用的。我在任何地方都看不到它。 ThreadUrl 类在 main() 函
Python threading.Thread、范围和垃圾收集
假设我从 threading.Thread 派生: from threading import Thread class Worker(Thread): def start(self):
python - 'threading' 对象没有属性 'Thread'
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and th
.net - WinDbg/SOS : How to correlate managed threads from ! 带有 System.Threading.Thread 实例的线程命令
使用 WinDbg 和 SOS，我有以下内容: 0:011> !threads ThreadCount: 7 UnstartedThread: 0 BackgroundThread: 4 Pendin
java - 谷歌应用引擎错误 : Fetch in a thread that is neither the original request thread nor a thread created by ThreadManager
App Engine 给出错误: com.google.apphosting.api.ApiProxy$CallNotFoundException: Can't make API call urlfe
java - "Thread-19"java.lang.IllegalStateException : Not on FX application thread; currentThread = Thread-19
我正在尝试将 Swing JEditorPane 嵌入到 JavaFX 项目中，如下代码所示。 Platform.runLater(() -> { SyntaxTester ob = new

首页

博学

6Ren·AI

商城

How to abstract typical memory behavior from a complex multi-threaded workload?(如何从复杂的多线程工作负载中抽象出典型的内存行为？)