gpt4 book ai didi

performance - 缓存预取是在硬件地址空间还是虚拟地址空间中完成?

转载 作者:行者123 更新时间:2023-12-03 16:42:00 24 4
gpt4 key购买 nike

硬件预取器是在连续的虚拟地址上运行,还是在连续的硬件地址上运行?想象一下你有一个跨越多个页面的大字节数组的情况。在虚拟地址空间中,字节是连续的,但实际上页面可以在硬件中以不相交的页面分配。我希望预取器能够在开始引入属于下一页的缓存行之前使用 TLB 进行适当的转换。

是这样吗?
我找不到证实这一点的信息,希望有人能提供更多见解。

我主要要求 x86,但任何见解将不胜感激

最佳答案

我无法为 AMD 处理器回答这个问题,但我可以为英特尔处理器回答这个问题。
据我所知,硬件预取器不应在当前英特尔处理器上跨页面边界预取缓存行。
来自英特尔 Intel® 64 and IA-32 Architectures Optimization Reference Manual , 第 7.5.2 节,硬件预取:

Automatic hardware prefetch can bring cache lines into the unified last-level cache based on prior data misses. It will attempt to prefetch two cache lines ahead of the prefetch stream. Characteristics of the hardware prefetcher are:

  • [...]
  • It will not prefetch across a 4-KByte page boundary. A program has to initiate demand loads for the new page before the hardware prefetcher starts prefetching from the new page.

上面的段落是在谈论“统一的最后一级缓存”,但在 L1d 领域情况并不好:
2.3.5.4、数据预取

Data Prefetch to L1 Data Cache

Data prefetching is triggered by load operations when the following conditions are met:

  • [...]

  • The prefetched data is within the same 4K byte page as the load instruction that triggered it.


或在 L2 中:

The following two hardware prefetchers fetched data from memory to the L2 cache and last level cache:

Spatial Prefetcher: [...]

Streamer: This prefetcher monitors read requests from the L1 cache for ascending and descending sequences of addresses. Monitored read requests include L1 DCache requests initiated by load and store operations and by the hardware prefetchers, and L1 ICache requests for code fetch. When a forward or backward stream of requests is detected, the anticipated cache lines are prefetched. Prefetched cache lines must be in the same 4K page.


但是,处理器可能会预取分页数据。来自英特尔 Intel® 64 and IA-32 Architectures Software Developer Manuals ,第 3A 卷,4.10.2.3,TLB 使用详情:

The processor may cache translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.


第 3A 卷,4.10.3.1,分页结构的缓存:

The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.


我知道您询问了硬件预取,但您应该能够对数据(而不是指令)使用软件预取:

In older microarchitectures, PREFETCH causing a Data Translation Lookaside Buffer (DTLB) miss would be dropped. In processors based on Nehalem, Westmere, Sandy Bridge, and newer microarchitectures, Intel Core 2 processors, and Intel Atom processors, PREFETCH causing a DTLB miss canbe fetched across a page boundary.

关于performance - 缓存预取是在硬件地址空间还是虚拟地址空间中完成?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42983439/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com