gpt4 book ai didi

assembly - 正确使用ARM PLD指令(ARM11)

转载 作者:行者123 更新时间:2023-12-02 22:14:05 26 4
gpt4 key购买 nike

ARM实际上并没有提供太多关于这条指令的正确使用方式,但我发现它在其他地方使用,知道它需要一个地址作为在哪里读取下一个值的提示。

我的问题是,给定一个 256 字节的紧密复制循环 ldm/stm指令,例如 r4-r11 x 8,最好在复制之前在每个指令对之间预取每个缓存行,或者根本不这样做,如memcpy问题不在于读取和写入同一内​​存区域。很确定我的缓存行大小是 64 字节,但也可能是 32 字节 - 在此处编写最终代码之前等待确认。

最佳答案

来自Cortex-A Series Programmer's Guide ,第 17.4 章(注意:ARM11 的某些细节可能有所不同):

Best performance for memcpy() is achieved using LDM of a whole cache line and then writing these values with an STM of a whole cache line. Alignment of the stores is more important than alignment of the loads. The PLD instruction should be used where possible. There are four PLD slots in the load/store unit. A PLD instruction takes precedence over the automatic pre-fetcher and has no cost in terms of the integer pipeline performance. The exact timing of PLD instructions for best memcpy() can vary slightly between systems, but PLD to an address three cache lines ahead of the currently copying line is a useful starting point.

关于assembly - 正确使用ARM PLD指令(ARM11),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6414555/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com