gpt4 book ai didi

python - 使用python从word文件中的评论中删除个人信息

转载 作者:太空狗 更新时间:2023-10-30 02:18:01 30 4
gpt4 key购买 nike

我想从 word 文件中的评论中删除所有个人信息。

删除作者姓名没问题,我使用以下方法做到了,

document = Document('sampleFile.docx')
core_properties = document.core_properties
core_properties.author = ""
document.save('new-filename.docx')

但这不是我需要的,我想删除在该 word 文件中发表评论的任何人的姓名。

我们手动执行的方法是进入首选项->安全->保存时从此文件中删除个人信息

最佳答案

如果您想从 .docx 文件中的评论中删除个人信息,您必须深入研究文件本身。

因此,.docx 只是一个包含单词特定文件的 .zip 存档。我们需要覆盖它的一些内部文件,我能找到的最简单的方法是将所有文件复制到内存中,更改我们必须更改的任何内容,然后将其全部放入一个新文件中。

import re
import os
from zipfile import ZipFile

docx_file_name = '/path/to/your/document.docx'

files = dict()

# We read all of the files and store them in "files" dictionary.
document_as_zip = ZipFile(docx_file_name, 'r')
for internal_file in document_as_zip.infolist():
file_reader = document_as_zip.open(internal_file.filename, "r")
files[internal_file.filename] = file_reader.readlines()
file_reader.close()

# We don't need to read anything more, so we close the file.
document_as_zip.close()

# If there are any comments.
if "word/comments.xml" in files.keys():
# We will be working on comments file...
comments = files["word/comments.xml"]

comments_new = str()

# Files contents have been read as list of byte strings.
for comment in comments:
if isinstance(comment, bytes):
# Change every author to "Unknown Author".
comments_new += re.sub(r'w:author="[^"]*"', "w:author=\"Unknown Author\"", comment.decode())

files["word/comments.xml"] = comments_new

# Remove the old .docx file.
os.remove(docx_file_name)

# Now we want to save old files to the new archive.
document_as_zip = ZipFile(docx_file_name, 'w')
for internal_file_name in files.keys():
# Those are lists of byte strings, so we merge them...
merged_binary_data = str()
for binary_data in files[internal_file_name]:
# If the file was not edited (therefore is not the comments.xml file).
if not isinstance(binary_data, str):
binary_data = binary_data.decode()

# Merge file contents.
merged_binary_data += binary_data

# We write old file contents to new file in new .docx.
document_as_zip.writestr(internal_file_name, merged_binary_data)

# Close file for writing.
document_as_zip.close()

关于python - 使用python从word文件中的评论中删除个人信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37955062/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com