引擎盖下的 Python 读/写/查找操作-6ren

引擎盖下的 Python 读/写/查找操作

转载作者：可可西里更新时间：2023-11-01 11:51:37

在 Linux 系统上创建字符设备时，我使用 Python 及其基本文件操作与它进行交互。

在经历了几次崩溃之后，我开始打印调试消息并注意到一个奇怪的行为:Python 似乎以一种奇怪的方式“优化”了文件操作。

让我们看一个例子；这是交互的基本代码和输出:

内核模块

// Several includes and kernel module initialization

static ssize_t dev_read(struct file *filep, char *buffer, size_t len, long long *offset){
    printk(KERN_INFO "[DEBUGGER] - dev_read with len: %d, offset: 0x%llx.\n", len, offset[0]);
    return len;
}

static ssize_t dev_write(struct file *filep, const char *buffer, size_t len, long long *offset){
    printk(KERN_INFO "[DEBUGGER] - dev_write with len: %d, offset: 0x%llx.\n", len, offset[0]);
    return len;
}

static long long dev_llseek(struct file *filep, long long offset, int orig){
    printk(KERN_INFO "[DEBUGGER] - dev_llseek with offset: 0x%llx, orig: %d\n", offset, orig);
    return offset;
}

static int dev_release(struct inode *inodep, struct file *filep){
    return 0; // Success
}

static int dev_open(struct inode *inodep, struct file *filep){
    return 0; // Success
}

static struct file_operations fops =
{
   .open = dev_open,
   .read = dev_read,
   .write = dev_write,
   .release = dev_release,
   .llseek = dev_llseek,
};

int init_module(void){
   // Code to create character device
   return 0;
}

void cleanup_module(void){
   // Code to delete character device
}

python

with open("/dev/chardevice", "r+b") as character:
   character.seek(1)
   character.read(4)
   character.seek(0x7f123456)
   character.read(20)
   character.write("\xff" * 4)

输出

# seek(1)
[DEBUGGER] - dev_llseek with offset: 0x0, orig: 0
[DEBUGGER] - dev_read with len: 1, offset: 0x0.
[DEBUGGER] - dev_llseek with offset: 0x1, orig: 0
# read(4)
[DEBUGGER] - dev_read with len: 4, offset: 0x0.
# seek(0x7f123456)
[DEBUGGER] - dev_llseek with offset: 0x7f123000, orig: 0
[DEBUGGER] - dev_read with len: 1110, offset: 0x0.
# read(20)
[DEBUGGER] - dev_read with len: 4096, offset: 0x0.
# write("\xff" * 4)
[DEBUGGER] - dev_write with len: 4, offset: 0x0.

很明显，基本的文件操作不会直接转化为对文件的相同操作，最明显的例子是寻找 0x7f123000 而不是 0x7f123456 和读取 4096 字节，而只请求读取 20 字节。

这引发了以下问题:

为什么这是一项功能？
它实现了什么优化，因为它的大部分看起来不像是一个好的“下一步操作”预测？
是否在任何地方进行了记录，以了解在预先编写读/写功能时会发生什么情况？
除了对这个领域的纯粹兴趣之外，我仍然希望使用 Python 来更轻松地访问 - 那么有没有什么方法可以禁用此优化，并强制 Python 像执行这些操作的 C 代码一样运行？

谢谢!

最佳答案

Python 的文件对象实际上是FILE* 对象(C 语言)的包装器，因此它们是缓冲流。由于缓冲，Python 对文件的操作不会将它们转换为具有相同参数的系统调用，而是尝试优化请求时间(针对当前和 future 的操作)。

方法open() accepts 缓冲参数作为 3d 参数。传递 0 应该禁用缓冲，因此 python 会将所有文件的请求直接转换到底层系统: