There is a complex multi-threaded program running in my system, which performs various types of operations. At the same time, I cannot access its source code.
我的系统中运行着一个复杂的多线程程序,它执行各种类型的操作。同时,我无法访问它的源代码。
I want to analyze the potential impact of its memory access behavior on performance for further optimization.
我想分析它的内存访问行为对性能的潜在影响,以便进一步优化。
Is there a way to identify some typical memory access behavior and their proportion? For example, continuous memory reading and writing, linked list traversal, data sharing between threads, etc.
有没有办法确定一些典型的内存访问行为及其比例?例如,连续的内存读写、链表遍历、线程间的数据共享等。
Intel's pin tool can generate load/store trace, can it be used for the above analysis? If possible, what should be done?
英特尔的引脚工具可以生成加载/存储跟踪,它是否可以用于上述分析?如果可能的话,应该做些什么呢?
Are there other tools or methodologies that I can refer to?
是否有其他工具或方法可供我参考?
更多回答
If you're looking for "bad" patterns like linked-list or tree traversal, you'd be looking for loads whose address depends on another recent load (not including loads from locals on the stack, like [rsp+const]
addressing modes, unless store/reload is part of a chain). IDK if PIN can help detect such patterns.
如果您正在寻找像链表或树遍历这样的“坏”模式,那么您将寻找其地址依赖于另一个最近加载的加载(不包括来自堆栈上的本地加载的加载,如[RSP+Const]寻址模式,除非存储/重新加载是链的一部分)。如果PIN可以帮助检测此类模式,请确认。
Other than that, the main things I'd look for would be loads that tend to miss in cache, especially ones that end up stalling execution. e.g. on Intel (such as Skylake), there's an event cycle_activity.stalls_l3_miss
which might be one to look for: Execution stalls while L3 cache miss demand load is outstanding. An "execution stall" is when no uops are dispatched from the scheduler to an execution unit that cycle. There are lots of other events for cache misses in various levels, and lines in/out.
除此之外,我要寻找的主要内容是缓存中往往未命中的加载,特别是那些最终导致执行停滞的加载。例如,在英特尔(如Skylake)上,有一个事件Cycle_Activity.stalls_L3_Misse可能是要查找的:当L3缓存未命中请求加载未完成时,执行暂停。“执行停顿”是指没有微指令从调度器调度到该周期的执行单元。对于不同级别的高速缓存未命中以及行输入/输出,还有许多其他事件。
我是一名优秀的程序员,十分优秀!