gpt4 book ai didi

python - 用utf-8编码将ElementTree直接写入zip

转载 作者:行者123 更新时间:2023-12-04 01:32:03 25 4
gpt4 key购买 nike

我想修改大量的XML。它们存储在 ZIP 文件中。源 XML 是 utf-8 编码的(至少对于 Linux 上的 file 工具的猜测)并且具有正确的 XML 声明:<?xml version='1.0' encoding='UTF-8'?>

目标 ZIP 和其中包含的 XML 也应具有正确的 XML 声明。但是,(至少对我而言)最明显的方法(使用 ElementTree.tostring )失败了。

这是一个独立的示例,它应该是开箱即用的。
简短的演练:

  • 导入 o​​jit_rli
  • 准备工作(创建 src.zip,这些 ZIP 是我实际应用中给出的)
  • 程序的实际工作(修改 XML),从 # read XMLs from zip
  • 开始

    请关注下半部分,尤其是 # APPROACH 1APPROACH 2APPROACH 3 :
    import os
    import tempfile
    import zipfile
    from xml.etree.ElementTree import Element, parse

    src_1 = os.path.join(tempfile.gettempdir(), "one.xml")
    src_2 = os.path.join(tempfile.gettempdir(), "two.xml")
    src_zip = os.path.join(tempfile.gettempdir(), "src.zip")
    trgt_appr1_zip = os.path.join(tempfile.gettempdir(), "trgt_appr1.zip")
    trgt_appr2_zip = os.path.join(tempfile.gettempdir(), "trgt_appr2.zip")
    trgt_appr3_zip = os.path.join(tempfile.gettempdir(), "trgt_appr3.zip")

    # file on hard disk that must be used due to ElementTree insufficiencies
    tmp_xml_name = os.path.join(tempfile.gettempdir(), "curr_xml.tmp")

    # prepare src.zip
    tree1 = ElementTree(Element('hello', {'beer': 'good'}))
    tree1.write(os.path.join(tempfile.gettempdir(), "one.xml"), encoding="UTF-8", xml_declaration=True)
    tree2 = ElementTree(Element('scnd', {'äkey': 'a value'}))
    tree2.write(os.path.join(tempfile.gettempdir(), "two.xml"), encoding="UTF-8", xml_declaration=True)

    with zipfile.ZipFile(src_zip, 'a') as src:
    with open(src_1, 'r', encoding="utf-8") as one:
    string_representation = one.read()
    # write to zip
    src.writestr(zinfo_or_arcname="one.xml", data=string_representation.encode("utf-8"))
    with open(src_2, 'r', encoding="utf-8") as two:
    string_representation = two.read()
    # write to zip
    src.writestr(zinfo_or_arcname="two.xml", data=string_representation.encode("utf-8"))
    os.remove(src_1)
    os.remove(src_2)

    # read XMLs from zip
    with zipfile.ZipFile(src_zip, 'r') as zfile:

    updated_trees = []

    for xml_name in zfile.namelist():

    curr_file = zfile.open(xml_name, 'r')
    tree = parse(curr_file)
    # modify tree
    updated_tree = tree
    updated_tree.getroot().append(Element('new', {'newkey': 'new value'}))
    updated_trees.append((xml_name, updated_tree))

    for xml_name, updated_tree in updated_trees:

    # write to target file
    with zipfile.ZipFile(trgt_appr1_zip, 'a') as trgt1_zip, zipfile.ZipFile(trgt_appr2_zip, 'a') as trgt2_zip, zipfile.ZipFile(trgt_appr3_zip, 'a') as trgt3_zip:

    #
    # APPROACH 1 [DESIRED, BUT DOES NOT WORK]: write tree to zip-file
    # encoding in XML declaration missing
    #
    # create byte representation of elementtree
    byte_representation = tostring(element=updated_tree.getroot(), encoding='UTF-8', method='xml')
    # write XML directly to zip
    trgt1_zip.writestr(zinfo_or_arcname=xml_name, data=byte_representation)

    #
    # APPROACH 2 [WORKS IN THEORY, BUT DOES NOT WORK]: write tree to zip-file
    # encoding in XML declaration is faulty (is 'utf8', should be 'utf-8' or 'UTF-8')
    #
    # create byte representation of elementtree
    byte_representation = tostring(element=updated_tree.getroot(), encoding='utf8', method='xml')
    # write XML directly to zip
    trgt2_zip.writestr(zinfo_or_arcname=xml_name, data=byte_representation)

    #
    # APPROACH 3 [WORKS, BUT LACKS PERFORMANCE]: write to file, then read from file, then write to zip
    #
    # write to file
    updated_tree.write(tmp_xml_name, encoding="UTF-8", method="xml", xml_declaration=True)
    # read from file
    with open(tmp_xml_name, 'r', encoding="utf-8") as tmp:
    string_representation = tmp.read()
    # write to zip
    trgt3_zip.writestr(zinfo_or_arcname=xml_name, data=string_representation.encode("utf-8"))

    os.remove(tmp_xml_name)
    APPROACH 3 有效,但它比其他两个资源密集得多。
    APPROACH 2 是我可以用实际的 XML 声明来编写 ElementTree 对象的唯一方法——结果证明它是无效的( utf8 而不是 UTF-8/ utf-8 )。
    APPROACH 1 将是最需要的——但在管道的稍后读取过程中失败,因为缺少 XML 声明。

    问题: 如何摆脱先将整个 XML 写入磁盘,然后再读取它,将其写入 zip 并在完成 zip 后删除它?我错过了什么?

    最佳答案

    您可以使用 io.BytesIO 对象。
    这允许使用 ElementTree.write ,同时避免将树导出到磁盘:

    import zipfile
    from io import BytesIO
    from xml.etree.ElementTree import ElementTree, Element

    tree = ElementTree(Element('hello', {'beer': 'good'}))
    bio = BytesIO()
    tree.write(bio, encoding='UTF-8', xml_declaration=True)
    with zipfile.ZipFile('/tmp/test.zip', 'w') as z:
    z.writestr('test.xml', bio.getvalue())

    如果您使用的是 Python 3.6 或更高版本,还有一个更短的解决方案:
    您可以从 ZipFile 对象中获取可写文件对象,您可以将其传递给 ElementTree.write :

    import zipfile
    from xml.etree.ElementTree import ElementTree, Element

    tree = ElementTree(Element('hello', {'beer': 'good'}))
    with zipfile.ZipFile('/tmp/test.zip', 'w') as z:
    with z.open('test.xml', 'w') as f:
    tree.write(f, encoding='UTF-8', xml_declaration=True)

    这还有一个优点,即您不会在内存中存储树的多个副本,这可能是大树的相关问题。

    关于python - 用utf-8编码将ElementTree直接写入zip,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60755697/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com