gpt4 book ai didi

linux - 读/写字符设备时如何避免高CPU使用率?

转载 作者:塔克拉玛干 更新时间:2023-11-03 01:13:10 25 4
gpt4 key购买 nike

我需要为带有 SRAM 的 PCIe 设备编写一个 linux 内核驱动程序。

对于第一次尝试,我编写了一个驱动程序来使用字符设备从 PCIe 访问 SRAM。

一切正常,但有一个问题。 SRAM 很慢 1MB 读/写大约需要 2 秒,这是硬件限制。读/写时 CPU 100% 忙。女巫是个问题。我不需要速度,读/写可以很慢,但为什么要占用这么多 CPU?

缓冲区用pci_iomap初始化:

  g_mmio_buffer[0] = pci_iomap(pdev, SRAM_BAR_H, g_mmio_length);

读/写函数如下所示:

static ssize_t dev_read(struct file *fp, char *buf, size_t len, loff_t *off) {
unsigned long rval;
size_t copied;

rval = copy_to_user(buf, g_mmio_buffer[SRAM_BAR] + *off, len);

if (rval < 0) return -EFAULT;

copied = len - rval;
*off += copied;

return copied;
}

static ssize_t dev_write(struct file *fp, const char *buf, size_t len, loff_t *off) {
unsigned long rval;
size_t copied;

rval = copy_from_user(g_mmio_buffer[SRAM_BAR] + *off, buf, len);

if (rval < 0) return -EFAULT;

copied = len - rval;
*off += copied;

return copied;
}

问题是 CPU 使用率高我该怎么办?

我应该重写驱动程序以使用 block 设备而不是字符吗?

允许 CPU 在读取/保存数据时处理另一个进程?

最佳答案

正如@0andriy 所指出的,您不应该直接访问 iomem。有memcpy_toio()等函数和 memcpy_fromio()可以在 iomem 和普通内存之间复制,但它们只适用于内核虚拟地址。

NOTE: The use of get_user_pages_fast(), set_page_dirty_lock() and put_page() described below should be changed for Linux kernel version 5.6 onwards. The required changes are described later.

为了在不使用中间数据缓冲区的情况下从用户空间地址复制到 iomem,需要将用户空间内存页“固定”到物理内存中。这可以使用 get_user_pages_fast() 来完成.但是,固定页面可能位于内核永久映射内存之外的“高端内存”(highmem) 中。这些页面需要使用 kmap_atomic() 在短时间内临时映射到内核虚拟地址空间。 . (有管理 kmap_atomic() 的使用的规则,还有用于长期映射 highmem 的其他函数。查看 highmem 文档以获取详细信息。)

一旦用户空间页面被映射到内核虚拟地址空间,memcpy_toio()memcpy_fromio()可用于在该页面和 iomem 之间进行复制。

kmap_atomic() 临时映射的页面需要通过 kunmap_atomic() 取消映射.

get_user_pages_fast() 固定的用户内存页面需要通过调用 put_page() 单独取消固定, 但如果页面内存已被写入(例如 memcpy_fromio() ,它必须首先被 set_page_dirty_lock() 标记为“脏”,然后再调用 put_page()

Note: Change for kernel version 5.6 onwards.

  1. The call to get_user_pages_fast() should be changed to pin_user_pages_fast().
  2. Dirty pages pinned by pin_user_pages_fast() should be unpinned by unpin_user_pages_dirty_lock() with the last argument set true.
  3. Clean pages pinned by pin_user_pages_fast() should be unpinned by unpin_user_page(), unpin_user_pages(), or unpin_user_pages_dirty_lock() with the last argument set false.
  4. put_page() must not be used to unpin pages pinned by pin_user_pages_fast().
  5. For code to be compatible with earlier kernel versions, the availability of pin_user_pages_fast(), unpin_user_page(), etc. can be determined by whether the FOLL_PIN macro has been defined by #include <linux/mm.h>.

将所有这些放在一起,可以使用以下函数在用户内存和 iomem 之间进行复制:

#include <linux/kernel.h>
#include <linux/uaccess.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/io.h>

/**
* my_copy_to_user_from_iomem - copy to user memory from MMIO
* @to: destination in user memory
* @from: source in remapped MMIO
* @n: number of bytes to copy
* Context: process
*
* Returns number of uncopied bytes.
*/
long my_copy_to_user_from_iomem(void __user *to, const void __iomem *from,
unsigned long n)
{
might_fault();
if (!access_ok(to, n))
return n;
while (n) {
enum { PAGE_LIST_LEN = 32 };
struct page *page_list[PAGE_LIST_LEN];
unsigned long start;
unsigned int p_off;
unsigned int part_len;
int nr_pages;
int i;

/* Determine pages to do this iteration. */
p_off = offset_in_page(to);
start = (unsigned long)to - p_off;
nr_pages = min_t(int, PAGE_ALIGN(p_off + n) >> PAGE_SHIFT,
PAGE_LIST_LEN);
/* Lock down (for write) user pages. */
#ifdef FOLL_PIN
nr_pages = pin_user_pages_fast(start, nr_pages, FOLL_WRITE, page_list);
#else
nr_pages = get_user_pages_fast(start, nr_pages, FOLL_WRITE, page_list);
#endif
if (nr_pages <= 0)
break;

/* Limit number of bytes to end of locked-down pages. */
part_len =
min(n, ((unsigned long)nr_pages << PAGE_SHIFT) - p_off);

/* Copy from iomem to locked-down user memory pages. */
for (i = 0; i < nr_pages; i++) {
struct page *page = page_list[i];
unsigned char *p_va;
unsigned int plen;

plen = min((unsigned int)PAGE_SIZE - p_off, part_len);
p_va = kmap_atomic(page);
memcpy_fromio(p_va + p_off, from, plen);
kunmap_atomic(p_va);
#ifndef FOLL_PIN
set_page_dirty_lock(page);
put_page(page);
#endif
to = (char __user *)to + plen;
from = (const char __iomem *)from + plen;
n -= plen;
part_len -= plen;
p_off = 0;
}
#ifdef FOLL_PIN
unpin_user_pages_dirty_lock(page_list, nr_pages, true);
#endif
}
return n;
}

/**
* my_copy_from_user_to_iomem - copy from user memory to MMIO
* @to: destination in remapped MMIO
* @from: source in user memory
* @n: number of bytes to copy
* Context: process
*
* Returns number of uncopied bytes.
*/
long my_copy_from_user_to_iomem(void __iomem *to, const void __user *from,
unsigned long n)
{
might_fault();
if (!access_ok(from, n))
return n;
while (n) {
enum { PAGE_LIST_LEN = 32 };
struct page *page_list[PAGE_LIST_LEN];
unsigned long start;
unsigned int p_off;
unsigned int part_len;
int nr_pages;
int i;

/* Determine pages to do this iteration. */
p_off = offset_in_page(from);
start = (unsigned long)from - p_off;
nr_pages = min_t(int, PAGE_ALIGN(p_off + n) >> PAGE_SHIFT,
PAGE_LIST_LEN);
/* Lock down (for read) user pages. */
#ifdef FOLL_PIN
nr_pages = pin_user_pages_fast(start, nr_pages, 0, page_list);
#else
nr_pages = get_user_pages_fast(start, nr_pages, 0, page_list);
#endif
if (nr_pages <= 0)
break;

/* Limit number of bytes to end of locked-down pages. */
part_len =
min(n, ((unsigned long)nr_pages << PAGE_SHIFT) - p_off);

/* Copy from locked-down user memory pages to iomem. */
for (i = 0; i < nr_pages; i++) {
struct page *page = page_list[i];
unsigned char *p_va;
unsigned int plen;

plen = min((unsigned int)PAGE_SIZE - p_off, part_len);
p_va = kmap_atomic(page);
memcpy_toio(to, p_va + p_off, plen);
kunmap_atomic(p_va);
#ifndef FOLL_PIN
put_page(page);
#endif
to = (char __iomem *)to + plen;
from = (const char __user *)from + plen;
n -= plen;
part_len -= plen;
p_off = 0;
}
#ifdef FOLL_PIN
unpin_user_pages(page_list, nr_pages);
#endif
}
return n;
}

其次,您可能可以通过替换 pci_iomap() 将 iomem 映射为“写入组合”来加速内存访问。与 pci_iomap_wc() .

第三,在访问慢速内存时避免等待 CPU 的唯一真正方法是不使用 CPU,而是使用 DMA 传输。其细节在很大程度上取决于您的 PCIe 设备的总线主控 DMA 功能(如果有的话)。用户内存页面在 DMA 传输期间仍需要固定(例如,通过 get_user_pages_fast()pin_user_pages_fast(),视情况而定),但不需要通过 kmap_atomic() 临时映射.

关于linux - 读/写字符设备时如何避免高CPU使用率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58413297/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com