gpt4 book ai didi

python - 在目录中使用 python-docx 搜索所有 docx 文件(批处理)

转载 作者:行者123 更新时间:2023-12-03 21:32:38 29 4
gpt4 key购买 nike

我有一堆具有相同嵌入式 Excel 表格的 Word docx 文件。我正在尝试从多个文件中提取相同的单元格。

我想出了如何硬编码到一个文件中:

from docx import Document

document = Document(r"G:\GIS\DESIGN\ROW\ROW_Files\Docx\006-087-003.docx")
table = document.tables[0]
Project_cell = table.rows[2].cells[2]
paragraph = Project_cell.paragraphs[0]
Project = paragraph.text

print Project

但是我该如何批处理呢?我在 listdir 上尝试了一些变体,但它们对我不起作用,而且我太新手无法自己实现。

最佳答案

如何遍历所有文件实际上取决于您的项目可交付成果。所有文件都在一个文件夹中吗?是否不仅仅是 .docx 文件?

为了解决所有问题,我们假设有子目录和其他文件与您的 .docx 文件混合在一起。为此,我们将使用 os.walk()os.path.splitext()

import os

from docx import Document

# First, we'll create an empty list to hold the path to all of your docx files
document_list = []

# Now, we loop through every file in the folder "G:\GIS\DESIGN\ROW\ROW_Files\Docx"
# (and all it's subfolders) using os.walk(). You could alternatively use os.listdir()
# to get a list of files. It would be recommended, and simpler, if all files are
# in the same folder. Consider that change a small challenge for developing your skills!
for path, subdirs, files in os.walk(r"G:\GIS\DESIGN\ROW\ROW_Files\Docx"):
for name in files:
# For each file we find, we need to ensure it is a .docx file before adding
# it to our list
if os.path.splitext(os.path.join(path, name))[1] == ".docx":
document_list.append(os.path.join(path, name))

# Now create a loop that goes over each file path in document_list, replacing your
# hard-coded path with the variable.
for document_path in document_list:
document = Document(document_path) # Change the document being loaded each loop
table = document.tables[0]
project_cell = table.rows[2].cells[2]
paragraph = project_cell.paragraphs[0]
project = paragraph.text

print project

要进一步阅读,这里是关于 os.listdir() 的文档.

另外,最好把你的代码放在一个可以复用的函数中,但这对你自己也是一个挑战!

关于python - 在目录中使用 python-docx 搜索所有 docx 文件(批处理),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42682648/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com