gpt4 book ai didi

linux - 映射 MMIO 区域回写不起作用

转载 作者:太空狗 更新时间:2023-10-29 11:11:54 25 4
gpt4 key购买 nike

我希望对 PCIe 设备的所有读写请求都由 CPU 缓存缓存。然而,它并没有像我预期的那样工作。

这些是我对回写 MMIO 区域的假设。

  1. 写入 PCIe 设备仅在缓存回写时发生。
  2. TLP 负载的大小是缓存 block 大小 (64B)。

但是,捕获的 TLP 不符合我的假设。

  1. 每次写入 MMIO 区域时都会写入 PCIe 设备。
  2. TLP 负载大小为 1B。

我使用以下用户空间程序和设备驱动程序将 8 字节的 0xff 写入 MMIO 区域。

用户程序的一部分

struct pcie_ioctl ioctl_control;
ioctl_control.bar_select = BAR_ID;
ioctl_control.num_bytes_to_write = atoi(argv[1]);
if (ioctl(fd, IOCTL_WRITE_0xFF, &ioctl_control) < 0) {
printf("ioctl failed\n");
}

部分设备驱动

case IOCTL_WRITE_0xFF:
{
int i;
char *buff;
struct pci_cdev_struct *pci_cdev = pci_get_drvdata(fpga_pcie_dev.pci_device);
copy_from_user(&ioctl_control, (void __user *)arg, sizeof(ioctl_control));
buff = kmalloc(sizeof(char) * ioctl_control.num_bytes_to_write, GFP_KERNEL);
for (i = 0; i < ioctl_control.num_bytes_to_write; i++) {
buff[i] = 0xff;
}
memcpy(pci_cdev->bar[ioctl_control.bar_select], buff, ioctl_control.num_bytes_to_write);
kfree(buff);
break;
}

我修改了 MTRRs 使相应的 MMIO 区域回写。 MMIO区域从0x0c7300000开始,长度为0x100000(1MB)。以下是不同策略的 cat/proc/mtrr 结果。请注意,我将每个区域设为独占。

不可缓存

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: uncachable
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable

写合并

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-combining
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable

回写

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-back
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable

以下是不同策略下 8B 写入的波形捕获。我使用集成逻辑分析仪 (ILA) 来捕获这些波形。请在设置 pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid 时观看 pcie_endpoint_litepcietlpdepacketizer_tlp_req_payload_dat。您可以通过计算这些波形示例中的 pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid 来计算数据包的数量。

  1. 不可缓存:link -> 正确,1B x 8 包
  2. 写入合并:link -> 正确,8B x 1 包
  3. 回写:link -> 意外,1B x 8 数据包

系统配置如下。

  • CPU:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • 操作系统:Linux内核4.15.0-38
  • PCIe 设备:使用 litepcie 编程的 Xilinx FPGA KC705

相关链接

  1. Generating a 64-byte read PCIe TLP from an x86 CPU
  2. How to Implement a 64B PCIe* Burst Transfer on Intel® Architecture
  3. Write Combining Buffer Out of Order Writes and PCIe
  4. Do Ryzen support write-back caching for Memory Mapped IO (through PCIe interface)?
  5. MTRR (Memory Type Range Register) control
  6. PATting Linux
  7. Down to the TLP: How PCI express devices talk (Part I)

最佳答案

简而言之,似乎映射 MMIO 区域回写在设计上不起作用。

如果有人认为可行,请上传答案。

我是来寻找 John McCalpin 的文章和答案的。首先,映射 MMIO 区域回写是不可能的。其次,在某些处理器上可以使用变通方法。

  1. 映射 MMIO 区域回写是不可能的

    Quote from this link

    FYI: The WB type will not work with memory-mapped IO. You can program the bits to set up the mapping as WB, but the system will crash as soon as it gets a transaction that it does not know how to handle. It is theoretically possible to use WP or WT to get cached reads from MMIO, but coherence has to be handled in software.

    Quote from this link

    Only when I set both PAT and MTRR to WB does the kernel crash

  2. 在某些处理器上可能有解决方法

    Notes on Cached Access to Memory-Mapped IO Regions, John McCalpin

    There is one set of mappings that can be made to work on at least some x86-64 processors, and it is based on mapping the MMIO space twice. Map the MMIO range with a set of attributes that allow write-combining stores (but only uncached reads). Map the MMIO range a second time with a set of attributes that allow cache-line reads (but only uncached, non-write-combined stores).

关于linux - 映射 MMIO 区域回写不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53311131/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com