java - 为什么 Files.readAllBytes 首先读取 bufsize 为 1？-6ren

java - 为什么 Files.readAllBytes 首先读取 bufsize 为 1？

转载作者：塔克拉玛干更新时间：2023-11-03 00:10:39

我正在编写一个简单的 Linux USB 字符驱动程序，允许从它创建的设备节点读取一个短字符串。

它工作正常，但我注意到使用 cat 从设备节点读取和使用 Files.readAllBytes 从 Java 程序读取之间存在差异.

使用 cat 读取，在第一次调用 file_operations.read 函数时传入大小为 131072 的缓冲区和 5 个字节字符串被复制:

kernel: [46863.186331] usbtherm: Device was opened
kernel: [46863.186407] usbtherm: buffer: 131072, read: 5, offset: 5
kernel: [46863.186444] usbtherm: done, returning 0
kernel: [46863.186481] usbtherm: Device was released

用Files.readAllBytes读取，第一次调用传入一个大小为1的缓冲区，然后传入一个大小为8191的缓冲区，剩下的4个字节被复制:

kernel: [51442.728879] usbtherm: Device was opened
kernel: [51442.729032] usbtherm: buffer: 1, read: 1, offset: 1
kernel: [51442.729102] usbtherm: buffer: 8191, read: 4, offset: 5
kernel: [51442.729140] usbtherm: done, returning 0
kernel: [51442.729158] usbtherm: Device was released

file_operations.read 函数(包括调试 printk 的)是:

static ssize_t device_read(struct file *filp, char *buffer, size_t length,
        loff_t *offset)
{
    int err = 0;
    size_t msg_len = 0;
    size_t len_read = 0;

    msg_len = strlen(message);

    if (*offset >= msg_len)
    {
        printk(KERN_INFO "usbtherm: done, returning 0\n");
        return 0;
    }

    len_read = msg_len - *offset;
    if (len_read > length)
    {
        len_read = length;
    }

    err = copy_to_user(buffer, message + *offset, len_read);
    if (err)
    {
        err = -EFAULT;
        goto error;
    }

    *offset += len_read;

    printk(KERN_INFO "usbtherm: buffer: %ld, read: %ld, offset: %lld\n", 
            length, len_read, *offset);

    return len_read;

error:
    return err;
}

两种情况下读取的字符串是相同的，所以我想没关系，我只是想知道为什么会有不同的行为？

最佳答案

GNU cat

来源 cat ,

      insize = io_blksize (stat_buf);

您可以看到缓冲区的大小由 coreutils 的 io_bliksize() 决定，它有一个相当 interesting comment在这方面，

/* As of May 2014, 128KiB is determined to be the minimium blksize to best minimize system call overhead.

所以这将用 cat 解释结果，因为 128KiB 是 131072 字节，GNUrus 认为这是最小化系统调用开销的最佳方式。

Files.readAllBytes

有点难以掌握，至少对于像我这样单纯的人来说是这样。 source of readAllBytes

public static byte[] readAllBytes(Path path) throws IOException {
    try (SeekableByteChannel sbc = Files.newByteChannel(path);
         InputStream in = Channels.newInputStream(sbc)) {
        long size = sbc.size();
        if (size > (long)MAX_BUFFER_SIZE)
            throw new OutOfMemoryError("Required array size too large");

        return read(in, (int)size);
    }
}

显示它只是在调用 read(InputStream, initialSize)其中初始大小由字节 channel 的大小决定。 size()方法也有一个有趣的评论，

The size of files that are not isRegularFile() files is implementation specific and therefore unspecified.

最后， read(InputStream, initialSize) 电话 InputStream.read(byteArray, offset, length)进行阅读(源代码中的注释来自原始源代码，并且自 capacity - nread = 0 以来令人困惑，因此第一次到达 while 循环时，它不读取到 EOF):

private static byte[] read(InputStream source, int initialSize)
        throws IOException {
    int capacity = initialSize;
    byte[] buf = new byte[capacity];
    int nread = 0;
    int n;
    for (;;) {
        // read to EOF which may read more or less than initialSize (eg: file
        // is truncated while we are reading)
        while ((n = source.read(buf, nread, capacity - nread)) > 0)
            nread += n;

        // if last call to source.read() returned -1, we are done
        // otherwise, try to read one more byte; if that failed we're done too
        if (n < 0 || (n = source.read()) < 0)
            break;

        // one more byte was read; need to allocate a larger buffer
        if (capacity <= MAX_BUFFER_SIZE - capacity) {
            capacity = Math.max(capacity << 1, BUFFER_SIZE);
        } else {
            if (capacity == MAX_BUFFER_SIZE)
                throw new OutOfMemoryError("Required array size too large");
            capacity = MAX_BUFFER_SIZE;
        }
        buf = Arrays.copyOf(buf, capacity);
        buf[nread++] = (byte)n;
    }
    return (capacity == nread) ? buf : Arrays.copyOf(buf, nread);
}

BUFFER_SIZE的声明对于 Files

    // buffer size used for reading and writing
    private static final int BUFFER_SIZE = 8192;

InputStream.read(byteArray, offset, length) 的文档/来源包含相关评论，

If length is zero, then no bytes are read and 0 is returned;

自 size()为您的设备节点返回 0 字节，这是 read(InputStream source, int initialSize) 中发生的情况:

在第一轮for (;;)循环:

capacity=0和 nread=0 .所以 source.read在while ((n = source.read(buf, nread, capacity - nread)) > 0)将 0 个字节读入 buf并返回 0:while 的条件循环是假的，它所做的就是n = 0作为条件的副作用。
自 n = 0 , source.read()在 if (n < 0 || (n = source.read()) < 0) break;读取 1 个字节，表达式计算为 false : 我们的 for循环不退出。这导致您的“缓冲区:1，读取:1，偏移量:1”
capacity缓冲区的设置为 BUFFER_SIZE , 读取的单个字节被放入 buf[0] , 和 nread递增。

第二轮for (;;)循环

因此有capacity=8192和 nread=1 ，这使得 while ((n = source.read(buf, nread, capacity - nread)) > 0) nread += n;从偏移量 1 读取 8191 字节直到 source.read返回 -1:EOF!这发生在读取剩余的 4 个字节之后。这导致您的“缓冲区:8191，读取:4，偏移量:5”。
从现在开始 n = -1 , if (n < 0 || (n = source.read()) < 0) break; 中的表达式n < 0 上的短路，这使得我们的 for循环退出而不读取任何更多字节。

最后，该方法返回 Arrays.copyOf(buf, nread) : 放置读取字节的那部分缓冲区的副本。

关于java - 为什么 Files.readAllBytes 首先读取 bufsize 为 1？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37635183/

文章推荐： linux - BASH 遍历文件夹和文件

文章推荐： c - sizeof 函数在外部数组上一直失败

文章推荐： linux - 使用 xxd 将整数转换为二进制

文章推荐： android - 是否可以在 Android 中模拟屏幕上的点击？

video - FFMPEG - bufsize 公式
有谁知道 bufsize 的公式是什么以及它在 FFMPEG 中检查的速率？我似乎找不到任何具体的答案。是吗: bufsize = 比特率/速率 ? 它应该在计算中使用 maxrate 而不是 b
python - 子进程忽略 bufsize 参数
我试图运行并处理某个 java 程序的标准输出，发现我的 Python 脚本永远在等待。然后我编写了一个新的测试脚本来测试 subprocess 并再次发现运行此脚本时没有看到任何输出: $ cat
python - 类型错误 : bufsize must be an integer?
我正在制作一个小程序，我可以使用它的默认编辑器从计算机的任何部分打开文件。这是我的代码: from os import * import subprocess print("Welcome to my
c - 如何在 C 中使用读写过去的 BUFSIZ
对于一项作业，我应该创建两种方法:方法一是 read()和 write()输入文件到一个空的输出文件，一次一个字节(慢慢地)。另一种方法将改为使用 char buf[BUFSIZ];其中 BUFSI
c - 为什么 getconf 不识别 BUFSIZ？
$ getconf BUFSIZgetconf: Unrecognized variable 'BUFSIZ' 是否有一种标准方法可以从 shell 中确定 BUFSIZ 的值？编写一个简单的 C 程
python - 子进程标准输入缓冲区不在 bufsize=1 的换行符上刷新
我有两个小的 python 文件，第一个使用 input 读取一行然后打印另一行 a = input() print('complete') 第二次尝试将其作为子进程运行 import subproc
c - 如何使用 getpwuid_r() 正确设置缓冲区和 bufsize？
背景信息我正在尝试获取用户用户名的字符串，唯一提供的有关该用户的信息是他们的 uid 号码。由于先前调用 fstat，我有 uid(并且 uid 存储在 struct stat). 我需要以线程安全
c++ - C - 一次从标准输入读取 BUFSIZE 个字符
我正在编写一个小型套接字程序 (GNU libc)。我有一个循环要求用户输入(例如“MSG>”)。当用户按下回车键时，消息被发送(当前发送到本地主机上的服务器)。无论如何，我想从标准输入读取到字符缓
readlink 可以用非零的 bufsize 返回 0 吗？
我正在尝试分析一些调用 readlink 的代码(不是我自己编写的)积极的bufsize ，然后测试结果是否为零。我看不出结果如何为零，我尝试过的所有内容都是-1，实际链接的大小或软链接(soft l
python - 子进程 cp 返回错误 - bufsize 必须是整数
这个问题在这里已经有了答案: bufsize must be an integer error while grepping a message (1 个回答) 关闭 6 年前。我正在尝试从一个目
python - bufsize must be an integer error while grepping a message
我在尝试 grep 查找由日志中的多行组成的消息时遇到以下错误...任何人都可以提供有关如何克服此错误的输入吗？代码:- print gerrit_commitmsg gerritl
C 编程，使用 BUFSIZ、malloc 和 memset 的缺陷
我在 C 编程中遇到了这个问题: 问题:下面的代码片段中用零填充缓冲区的缺陷是什么？如何解决这个问题？ char*buf; buf=malloc(BUFSIZ); memset(buf,0,BUFSI
java - 为什么 Files.readAllBytes 首先读取 bufsize 为 1？
我正在编写一个简单的 Linux USB 字符驱动程序，允许从它创建的设备节点读取一个短字符串。它工作正常，但我注意到使用 cat 从设备节点读取和使用 Files.readAllBytes 从 J
c - 在 C (C89) 中使用 printf 打印 BUFSIZ 宏常量时使用什么说明符
我们是否使用 lu 和 unsigned long 强制转换，如: printf("%lu\n", (unsigned long)BUFSIZ); /* 512 */ 希望最好的？还是有别的办法？这
python - 使用 universal_newlines=True(bufsize=1)和使用 Popen 的默认参数有什么区别
我正在尝试读取从 Python 调用的子进程的输出。为此，我使用 Popen(因为我认为如果使用 subprocess.call 则无法通过管道传输标准输出)。截至目前，我有两种方法可以做到这一点，
python - 使用不是 2 的幂的 bufsize 调用 socket.recv 的实际影响是什么？
要从 python 中的套接字读取数据，请调用 socket.recv，它具有以下签名: socket.recv(bufsize[, flags]) python docs for socket.re
python - 使用 python 运行 bash 脚本 - TypeError : bufsize must be an integer
我正在尝试编写 python 文件，即 python 中的 wxtrac tar 文件。据我所知，subprocess 是完成此任务的合适工具。我写了下面的代码: from subprocess
python - python 中的 socket 方法 recv(bufsize) 和 c 中的 recv(char *buf) 中的真正缓冲区是什么？
在python中，方法是: socket.recv(bufsize[, flags]) 在C中，方法是: int recv( _In_ SOCKET s, _Out_ char *buf, _In_

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 为什么 Files.readAllBytes 首先读取 bufsize 为 1？