python - 子进程调用的重定向输出丢失了吗？-6ren

python - 子进程调用的重定向输出丢失了吗？

转载作者：塔克拉玛干更新时间：2023-11-03 00:48:12

25

4

我有一些 Python 代码大致是这样的，使用了一些你可能有也可能没有的库:

# Open it for writing
vcf_file = open(local_filename, "w")

# Download the region to the file.
subprocess.check_call(["bcftools", "view",
    options.truth_url.format(sample_name), "-r",
    "{}:{}-{}".format(ref_name, ref_start, ref_end)], stdout=vcf_file)

# Close parent process's copy of the file object
vcf_file.close()

# Upload it
file_id = job.fileStore.writeGlobalFile(local_filename)

基本上，我正在启动一个子进程，该子进程应该为我下载一些数据并将其打印到标准输出。我将该数据重定向到一个文件，然后，一旦子进程调用返回，我就关闭我对该文件的句柄，然后将该文件复制到其他地方。

我观察到，有时，我期望的数据的尾端没有进入副本。现在，bcftools 可能只是偶尔不写入该数据，但我担心我可能会做一些不安全的事情并以某种方式在 subprocess.check_call() 返回后访问文件，但之前子进程写入标准输出的数据将其写入我可以看到的磁盘上。

查看 C 标准(因为 bcftools 是在 C/C++ 中实现的)，看起来当程序正常退出时，所有打开的流(包括标准输出)都被刷新并关闭。请参阅 [lib.support.start.term] 部分 here ，描述 exit() 的行为，当 main() 返回时隐式调用:

--Next, all open C streams (as mediated by the function signatures declared in ) with unwritten buffered data are flushed, all open C streams are closed, and all files created by calling tmp- file() are removed.30)

--Finally, control is returned to the host environment. If status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.31)

因此在子进程退出之前，它会关闭(并因此刷新)标准输出。

然而，manual page对于 Linux close(2) 注意关闭文件描述符并不一定保证写入它的任何数据实际上已经写入磁盘:

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored, use fsync(2). (It will depend on the disk hardware at this point.)

因此，看起来，当进程退出时，其标准输出流被刷新，但如果该流实际上由指向磁盘上文件的文件描述符支持，则不能保证写入磁盘已完成.我怀疑这可能就是这里发生的事情。

所以，我的实际问题:

我对规范的解读是否正确？子进程能否在其重定向的标准输出在磁盘上可用之前在其父进程看来已终止？
是否有可能以某种方式等到子进程写入文件的所有数据实际上已被操作系统同步到磁盘？
我应该在父进程的文件对象副本上调用 flush() 还是某些 Python 版本的 fsync()？这是否可以强制将子进程对同一文件描述符的写入提交到磁盘？

最佳答案

是的，数据写入磁盘(物理)之前可能需要几分钟。但您可以在此之前很久就阅读它。

除非您担心电源故障或内核崩溃；数据是否在磁盘上并不重要。内核是否认为数据已写入的重要部分。

一旦 check_call() 返回，就可以安全地从文件中读取。如果您没有看到所有数据；它可能表明 bcftools 中存在错误，或者 writeGlobalFile() 没有上传文件中的所有数据。您可以尝试通过禁用 bsftools 标准输出 (provide a pseudo-tty, use unbuffer command-line utility, etc) 的 block 缓冲模式来解决前者问题。

Q: Is my reading of the specs correct? Can a child process appear to its parent to have terminated before its redirected standard output is available on disk?

是的。是的。

Q: Is it possible to somehow wait until all data written by the child process to files has actually been synced to disk by the OS?

没有。 fsync() 在一般情况下是不够的。可能，您无论如何都不需要它(读回数据是一个不同的问题，与确保将数据写入磁盘不同)。

Q: Should I be calling flush() or some Python version of fsync() on the parent process's copy of the file object? Can that force writes to the same file descriptor by child processes to be committed to disk?

这将毫无意义。 .flush() 刷新父进程内部的缓冲区(您可以使用 open(filename, 'wb', 0) 避免在父进程中创建不必要的缓冲区) .

fsync()在文件描述符上工作( child 有自己的文件描述符)。我不知道内核是否对引用同一磁盘文件的不同文件描述符使用不同的缓冲区。同样，没关系——如果您观察到数据丢失(无崩溃)； fsync() 在这里无济于事。

Q: Just to be clear, I see that you're asserting that the data should indeed be readable by other processes, because the relevant OS buffers are shared between processes. But what's your source for that assertion? Is there a place in a spec or the Linux documentation you can point to that guarantees that those buffers are shared?

寻找"After a write() to a regular file has successfully returned" :

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

关于python - 子进程调用的重定向输出丢失了吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34623639/

25

4

0

文章推荐： mysql - 在 Perfect 框架中使用 MySQL 连接器

文章推荐： Linux:在 Bash 中查找默认 PDF 查看器的路径

mysql - 从 bash 输出 sql 输出
我正在使用 OUTFILE 命令，但由于权限问题和安全风险，我想将 shell 的输出转储到文件中，但出现了一些错误。我试过的 #This is a simple shell to connect t
JAVA——程序功能为输入输出、输入输出；想让程序的功能分别为输入输入、输出、输出
我刚刚开始学习 Java，我想克服在尝试为这个“问题”创建 Java 程序时出现的障碍。这是我必须创建一个程序来解决的问题: Tandy 喜欢分发糖果，但只有 n 颗糖果。对于她给第 i 个糖果的人，
c++ - 无法使用 ostream 输出 C++ 输出 vector
你好，我想知道我是否可以得到一些帮助来解决我在 C++ 中打印出 vector 内容的问题我试图以特定顺序在一个或两个函数调用中输出一个类的所有变量。但是我在遍历 vector 时收到一个奇怪的错误
gradle - 重复生成的类 gradle 输出 (build/...) 与 intellij 输出 (out/...)
我正在将 intellij (2019.1.1) 用于 java gradle (5.4.1) 项目，并使用 lombok (1.18.6) 来自动生成代码。 Intellij 将生成的源放在 out
javascript - 如何从 JavaScript 输出 JSON 输出，以便将其识别为 JSON？
编辑:在与 guest271314 交流后，我意识到问题的措辞(在我的问题正文中)可能具有误导性。我保留了旧版本并更好地改写了新版本背景: 从远程服务器获取 JSON 时，响应 header 包含一
java - StoredProcedureCall 1x Varchar 输出 1x Cursor 输出
我的问题可能有点令人困惑。我遇到的问题是我正在使用来自 Java 的 StoredProcedureCall 调用过程，例如: StoredProcedureCall call = new Store
com - COM IDL定义中[输入，输出]和[输出，检索]之间的差异
在我使用的一些IDL中，我注意到在方法中标记返回值有2个约定-[in, out]和[out, retval]。当存在多个返回值时，似乎使用了[in, out]，例如: HRESULT MyMetho
linux - 我如何告诉 `gar` 或 `ar` 输出 `elf32-i386` 输出？
当我查看 gar -h 的帮助输出时，它告诉我: [...] gar: supported targets: elf64-x86-64 elf32-i386 a.out-i386-linux [...
r - Knitr HTML Loop - 一些 HTML 输出，一些 R 输出
我想循环遍历一个列表，并以 HTML 格式打印其中的一部分，以代码格式打印其中的一部分。所以更准确地说:我想产生与这相同的输出 1 is a great number 2 is a great
"Error running git [init /workspace/output/]: exit status 1\n/workspace/output/.git: Permission denied\n"(“运行git[init/工作区/输出/]时出错：退出状态1\n/工作区/输出/.git：权限被拒绝\n”)
我有下面的tekton管道，并尝试在Google Cloud上运行。集群角色绑定。集群角色。该服务帐户具有以下权限。。例外。不确定需要为服务帐户设置什么权限。
Grepping 输出
当尝试从 make 过滤非常长的输出以获取特定警告或错误消息时，第一个想法是这样的: $ make | grep -i 'warning: someone set up us the bomb' 然而
Kotlin中抽象容器工具的泛型输入/输出？
我正在创建一个抽象工具类，该类对另一组外部类(不受我控制)进行操作。外部类在某些接口(interface)点概念上相似，但访问它们相似属性的语法不同。它们还具有不同的语法来应用工具操作的结果。我创建了
Python奇怪的按位与(&)输出
这个问题已经有答案了: What do numbers starting with 0 mean in python? (9 个回答) 已关闭 7 年前。在我的代码中使用按位与运算符 (&) 时，我
Python文件输入/输出
我写了这段代码来解析输入文件中的行输入格式:电影 ID 可以有多个条目，所以我们应该计算平均值输出:**没有重复(这是问题所在) import re f = open("ratings2.txt",
更高效的Python输入/输出
我需要处理超过 1000 万个光谱数据集。数据结构如下:大约有 1000 个 .fits(.fits 是某种数据存储格式)文件，每个文件包含大约 600-1000 个光谱，其中每个光谱中有大约 450
C编程频率计数器输入/输出
我编写了一个简单的 C 程序，它读取一个文件并生成一个包含每个单词及其出现频率的表格。该程序有效，我已经能够在 Linux 上运行的终端中获得显示的输出，但是，我不确定如何获得生成的显示以生成包含词
C语言音频输入/输出
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 1
python中的print()输出
1.普通的输出： print(str)#str是任意一个字符串，数字··· 2.格式化输出： ?
logstash 简单文件输入/输出
我无法让 logstash 正常工作。 Basic logstash Example作品。但后来我与 Advanced Pipeline Example 作斗争.也许这也可能是 Elasticsear
audio - 快速音频输入/输出
这是我想要做的: 我想让用户给我的程序一些声音数据(通过麦克风输入)，然后保持 250 毫秒，然后通过扬声器输出。我已经使用 Java Sound API 做到了这一点。问题是它有点慢。从发出声音到

首页

博学

6Ren·AI

商城

python - 子进程调用的重定向输出丢失了吗？