gpt4 book ai didi

python-pptx - 使用 python-pptx 从 powerpoint 演示文稿中删除所有元数据

转载 作者:行者123 更新时间:2023-12-05 03:59:24 26 4
gpt4 key购买 nike

我可以使用以下代码删除/覆盖一些元数据(存储在 core.xml 中的元数据):

def remove_metadata(prs):
"""Overwrites the metadata in core.xml however does not overwrite metadata which is stored in app.xml"""
prs.core_properties.title = 'PowerPoint Presentation'
prs.core_properties.last_modified_by = 'python-pptx'
prs.core_properties.revision = 1
prs.core_properties.modified = datetime.utcnow()
prs.core_properties.subject = ''
prs.core_properties.author = 'python-pptx'
prs.core_properties.keywords = ''
prs.core_properties.comments = ''
prs.core_properties.created = datetime.utcnow()
prs.core_properties.category = ''

prs = pptx.Presentation('my_pres.xml')
remove_metadata(prs)

这很有用 - 但还有其他元数据存储在 app.xml 中,例如 Company 和 Manager。我还需要清除这些属性。使用 python-pptx 如何编辑 app.xml 文件?

最佳答案

我找到了一个解决方案。它不一定是处理此问题的理想方法,但似乎有效:

def remove_metadata_from_app_xml(prs):
"""There is currently no functionality for handling app.xml so
have to find the part and then alter its blob manually
"""
package_parts = prs.part.package.parts
for part in package_parts:
if part.partname.endswith('app.xml'):
app_xml_part = part
app_xml = app_xml_part.blob.decode('utf-8')
tags_to_remove = ('Company', 'Manager', 'HyperlinkBase')
for tag in tags_to_remove:
pattern = f'<{tag}>.*<\/{tag}>'
app_xml = re.sub(pattern, '', app_xml)
app_xml_part.blob = bytearray(app_xml, 'utf-8')

关于python-pptx - 使用 python-pptx 从 powerpoint 演示文稿中删除所有元数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57221694/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com