gpt4 book ai didi

c - `lseek` 如何帮助确定文件是否为空?

转载 作者:行者123 更新时间:2023-12-03 16:47:06 25 4
gpt4 key购买 nike

我在看 source codecat来自 GNU coreutils,特别是圆检测。他们正在比较设备和 inode 并且工作正常,但是有一个 额外的案例 如果输入为空,它们允许输出为输入。查看代码,这必须是lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size)部分。我阅读了联机帮助页和 discussion我从 git blame 找到的,但我还是不太明白为什么调用 lseek需要。
这是如何做的要点 cat检测,如果它会无限耗尽磁盘(请注意,为了简洁起见,还删除了一些错误检查,完整的源代码在上面链接):

struct stat stat_buf;
fstat(STDOUT_FILENO, &stat_buf);
out_dev = stat_buf.st_dev;
out_ino = stat_buf.st_ino;
out_isreg = S_ISREG (stat_buf.st_mode) != 0;

// ...
// for <infile> in inputs {
input_desc = open (infile, file_open_mode); // or STDIN_FILENO
fstat(input_desc, &stat_buf);
/* Don't copy a nonempty regular file to itself, as that would
merely exhaust the output device. It's better to catch this
error earlier rather than later. */
if (out_isreg
&& stat_buf.st_dev == out_dev && stat_buf.st_ino == out_ino
&& lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size) // <--- This is the important line
{
// ...
}
// } (end of for)

我有两种可能的解释,但似乎都有些奇怪。
  • 根据某些标准(posix),文件可能是“空的”,尽管它仍然包含一些信息(用 st_size 计算)和 lseekopen通过抵消某些默认值来尊重这一点。我不知道为什么会这样,因为空意味着空,对吧?
  • 这种比较确实是两个条件的“巧妙”组合。这对我来说首先是有意义的,因为如果 input_desc将是 STDIN_FILENO并且不会有文件传送到 stdin , lseek会失败 ESPIPE (根据手册页)并返回 -1 .那么,整个语句就是 lseek(...) == -1 || stat_buf.st_size > 0 .但这不可能是真的,因为此检查仅在设备和 inode 相同的情况下才会发生,并且只有在 a) stdin 和 stdout 指向相同的 pty 时才会发生,然后 out_isreg将是 false或 b) stdin 和 stdout 指向同一个文件,但随后 lseek无法返回 -1 , 对?

  • 我还编写了一个打印返回值和 errno 的小程序。对于重要的部分,但对我来说没有什么突出的:
    #include <errno.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/stat.h>
    #include <unistd.h>

    int main(int argc, char **argv) {
    struct stat out_stat;
    struct stat in_stat;

    if (fstat(STDOUT_FILENO, &out_stat) < 0)
    exit(1);

    printf("this is written to stdout / into the file\n");

    int fd;
    if (argc > 1)
    fd = open(argv[1], O_RDONLY);
    else
    fd = STDIN_FILENO;

    fstat(fd, &in_stat);
    int res = lseek(fd, 0, SEEK_CUR);
    fprintf(stderr,
    "errno after lseek = %d, EBADF = %d, EINVAL = %d, EOVERFLOW = %d, "
    "ESPIPE = %d\n",
    errno, EBADF, EINVAL, EOVERFLOW, ESPIPE);

    fprintf(stderr, "input:\n\tlseek(...) = %d\n\tst_size = %ld\n", res,
    in_stat.st_size);

    printf("outsize is %ld", out_stat.st_size);
    }

    $ touch empty
    $ ./a.out < empty > empty
    errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
    input:
    lseek(...) = 0
    st_size = 0
    $ echo x > empty
    $ ./a.out < empty > empty
    errno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29
    input:
    lseek(...) = 0
    st_size = 0
    所以我的研究没有触及我的最终问题: 怎么样lseek帮助确定此示例中的文件是否为空来自 cat源代码?

    最佳答案

    这是我对其进行逆向工程的尝试 - 我找不到任何可以解释原因的公开讨论 lseek()放在那里(GNU coreutils 中没有其他地方这样做)。
    指导性问题是:条件何时lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size错误的?
    测试用例:

    #!/bin/bash
    # (edited based on comments)

    set -x

    # arrange for cat to start off past the end of a non-empty file

    echo abcdefghi > /tmp/so/catseek/input
    # get the shell to open the input file for reading & writing as file descriptor 7
    exec 7<>/tmp/so/catseek/input
    # read the whole file via that descriptor (but leave it open)
    dd <&7
    # ask linux what the current file position of file descriptor 7 is
    # should be everything dd read, namely 10 bytes, the size of the file
    grep ^pos: /proc/self/fdinfo/7
    # run cat, with pre and post content so that we know how to locate the interesting part
    # "-" will cause cat to reuse its file descriptor 0 rather than creating a new file descriptor
    # the redirections tell the shell to redirect file descriptors 1 and 0 to/from our open file descriptor 7
    # which, as you'll remember, already has a file position of 10 bytes
    strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post <&7 >&7
    # now let's see what's in the file
    cat /tmp/so/catseek/input
    和:
    $ cat /tmp/so/catseek/pre
    pre
    $ cat /tmp/so/catseek/post
    post
    catlseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size :
    + test.sh:8:echo abcdefghi
    + test.sh:10:exec
    + test.sh:12:dd
    abcdefghi
    0+1 records in
    0+1 records out
    10 bytes copied, 2.0641e-05 s, 484 kB/s
    + test.sh:15:grep '^pos:' /proc/self/fdinfo/7
    pos: 10
    + test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post
    lseek(0, 0, SEEK_CUR) = 14
    +++ exited with 0 +++
    + test.sh:22:cat /tmp/so/catseek/input
    abcdefghi
    pre
    post
    cat0 < stat_buf.st_size :
    + test.sh:8:echo abcdefghi
    + test.sh:10:exec
    + test.sh:12:dd
    abcdefghi
    0+1 records in
    0+1 records out
    10 bytes copied, 3.6415e-05 s, 275 kB/s
    + test.sh:15:grep '^pos:' /proc/self/fdinfo/7
    pos: 10
    + test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post
    ./src/cat: -: input file is output file
    +++ exited with 1 +++
    + test.sh:22:cat /tmp/so/catseek/input
    abcdefghi
    pre
    post
    如您所见,当 cat开始时,文件位置可能已经在文件尾之后,只检查文件大小就会 cat跳过文件,还会触发失败,如 if里面的代码声明是:
    error (0, 0, _("%s: input file is output file"), infile);
    ok = false;
    goto contin;
    使用 lseek()允许 cat说“哦,文件是相同的,并且不是空的,但是我们的读取仍然会变成空的,因为这就是读取 EOF 之后的工作方式,所以我们可以允许这种情况”。

    关于c - `lseek` 如何帮助确定文件是否为空?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65674534/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com