python - 如何有条件地从带有 ElementTree 的 XML 中获取值？-6ren

python - 如何有条件地从带有 ElementTree 的 XML 中获取值？

转载作者：行者123 更新时间：2023-12-05 04:36:51

我有 XML，我需要提取一些值，如果这些值不存在，我想要“N/A”。

<?xml version="1.0" encoding="utf-8"?>
<DEF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FF.xsd">
    <FOLD SERV="TRYING" VERSION="918" PLATFORM="UNIX" GROUP_NAME="UNIX" MODIFIED="False" LAST_UPLOAD="20220117145134UTC" TYPE="1" USED_BY_CODE="0">
        <JOB JOBID="835" APP="TRY" JOBNAME="JOBA" DESC="Extrac607" TYPE="db" FORM="Databases" CM_VER="N/A" MULTY_AGENT="N" VERSION_SERIAL="1" PARENT_FOLDER="UNIX">
            <VARIABLE NAME="%%TYPE_DB" VALUE="Open Query" />
            <VARIABLE NAME="%%APPLOG" VALUE="N" />
            <VARIABLE NAME="%%ACCOUNT" VALUE="ONLINE" />
            <VARIABLE NAME="%%TYPE" VALUE="Oracle" />
            <VARIABLE NAME="%%VERS" VALUE="11g" />
            <VARIABLE NAME="%%LENGHT" VALUE="16" />
        </JOB>
        <JOB JOBID="839" APP="TRY" JOBNAME="JOBB" DESC="Extrac617" TYPE="db" FORM="Databases" CM_VER="N/A" MULTY_AGENT="N" VERSION_SERIAL="1" PARENT_FOLDER="UNIX">
            <VARIABLE NAME="%%TYPEDB" VALUE="Open Query" />
            <VARIABLE NAME="%%ACCOUNT" VALUE="ONLINE" />
            <VARIABLE NAME="%%TYPE" VALUE="Oracle" />
            <VARIABLE NAME="%%VERS" VALUE="11g" />
            <VARIABLE NAME="%%LENGHT" VALUE="16" />
        </JOB>
    </FOLD>
</DEF>

这是我的代码:

from xml.etree import ElementTree
from contextlib import redirect_stdout
from collections import Counter

import xlsxwriter
import pprint
import os
import datetime

begin_time = datetime.datetime.now()
print(datetime.datetime.now())
    
###VARIABLES RUTASL   
fileXML= 'C:\\xxxx\\completa.xml'
cont=0
path= 'C:\\xxxxxx\\todo.csv'
outputExcel = 'C:\\xxxxxxx\\salida.xlsx'
file2 = 'C:\\xxxxxxxx\\todo.csv'




try:
    os.remove(path)
except OSError as e:  ## if failed, report it back to the user ##
    print ("Error: %s - %s." % (e.filename, e.strerror))

try:
    os.remove(outputExcel)
except OSError as e:  ## if failed, report it back to the user ##
    print ("Error: %s - %s." % (e.filename, e.strerror))




with open(fileXML, encoding="utf8") as f:
    tree = ElementTree.parse(f)

    
# using getchildren() within root and check if tag starts with keyword 
#print [node.text for node in root.getchildren() if node.tag.startswith('%%FTP-LPATH')]
root = tree.getroot()
set_tipos= set()
lista_tipos = list()

for node in tree.iter('JOB'):


    name = node.attrib.get('JOBNAME')
    appl_type = node.attrib.get('TYPE')
    appl_form = node.attrib.get('FORM')
    

    if appl_form:
        set_tipos.add(appl_form)
        lista_tipos.append(appl_form)
    else:
        set_tipos.add(appl_type)
        lista_tipos.append(appl_type)
        
     
    
print('#####################################')


cuenta1 = Counter(lista_tipos)
print(cuenta1)

workbook = xlsxwriter.Workbook(outputExcel)
worksheet = workbook.add_worksheet('Resumen')
bold =workbook.add_format({'bold':1})
centrado =workbook.add_format({'center_across':1})
centrado = workbook.add_format({'bold': True, 'center_across': True})

headings = list()
datos = list()
listaBD = list()
pointDB=1

for key, value in cuenta1.items():
    # scores = 0
    print(key)
    headings.append(key)
    datos.append(value)



if 'Databases' in headings:
    print('Existe base de datos')
    
    for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"): 
        listaBD.clear()
        name = node4.attrib.get('JOBNAME')
        appl_type = node4.attrib.get('TYPE')
        listaBD.append(name)
        listaBD.append(appl_type)
        for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%TYPE_DB']"):
            a=0 

        value = node4.attrib.get('VALUE', 'N/A')
        listaBD.append(value) 


        worksheet.write_row('A'+str(pointDB),listaBD)
        pointDB += 1


        node4.clear()

else:
    print('Don't exists')

print(datetime.datetime.now() - begin_time)

workbook.close()

                ...................

这行得通，但我在每种情况下都有 6 个 if(s)，而且程序非常慢。我已经编辑了主要代码。看来问题是 xml 的大小。 XML 有 50 MB。输出是: 我还能如何有条件地获取这些值？

最佳答案

我看到至少有两件事可以更改以删除 if block 。

更具体的 X 路径

我看到这个“如果类型匹配就做所有事情，否则跳过”分支:

appl_type = node4.attrib.get('TYPE')
if appl_type == 'db':

您可以通过将对 @TYPE 的检查放入您的顶级 X-Path 来避免这种情况:

for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"):

默认 setter/getter

您将获得一个属性值并根据它是否为 None 来决定要做什么:

if node4.attrib.get('VALUE') is None:
    print("don't exists")
    listaBD.append('N/A')
else:
    print("Existe")
    listaBD.append(node4.attrib.get('VALUE'))

在 get() 方法中使用 默认值 如果它是 None 并避免检查:

value = node4.attrib.get('VALUE', 'N/A')
listaBD.append(value)

杂项

这不会删除 if block ，但会重组并使代码更容易理解:

if 'DB' not in headings:
    quit()  # or `return` from a function/method... don't go on

print('DB Exists')

不删除 if，但现在所有内容都没有嵌套在它下面。

这是我对你的原始代码的完整编辑

from xml.etree import ElementTree as ET

tree = ET.parse('input.xml')
headings = ['DB']
listaBD = []

if 'DB' not in headings:
    quit()  # or `return` from a function/method... don't go on

print('DB Exists')

for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"):
    print('//////////////////////////')
    print(node4.tag, node4.attrib)
    name = node4.attrib.get('JOBNAME')
    appl_type = node4.attrib.get('TYPE')
    
    print(name + " " + appl_type)
    for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%TYPE_DB']"):
        print('TIPO')
        print(node4.attrib.get('VALUE'))

    value = node4.attrib.get('VALUE', 'N/A')
    listaBD.append(value)

    node4.clear()
    for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%DBSCHEMA']"):
        print(node4)

关于python - 如何有条件地从带有 ElementTree 的 XML 中获取值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70775545/

文章推荐： c# - 如何让图片框不重叠？

文章推荐： javascript - 如何更改开盘和收盘详情/摘要的图片？

javascript - 带有 li & ul 的多级下拉列表，带有 PHP 数组循环，带有 CSS 和 jQuery
我想使用 li 和 ul 制作一个多级下拉列表，以便显示我博客中按年和月排序的所有文章。我希望我的下拉菜单看起来像 Google Blogspot 下拉菜单: 这是我的 CSS 和 HTML 代码 u
c++ - 带有 gmp 的代码块，带有 << 运算符和 mp*_class 的段错误
我在 Win 7 64 机器上将 CodeBlocks 与 gcc 4.7.2 和 gmp 5.0.5 结合使用。开始使用 gmpxx 后，我看到一个奇怪的段错误，它不会出现在 +、- 等运算符中，但
javascript - 带有 Tern 的 CodeMirror - 带有 Javascript Intellisense 的自定义 "Types"
我正在使用 tern 为使用 CodeMirror 运行的窗口提供一些增强的智能感知，它工作正常，但我遇到了一个问题，我想添加一些自定义“types”，可以这么说，这样下拉列表中它们旁边就有图标了。我
带有 PC 作为 USB 主机的 Android ADK，带有 libusb，批量传输错误
我正在尝试让我的 PC 成为 Android 2.3.4 设备的 USB 主机，以便能够在不需要实际“附件”的情况下开发 API。为此，我需要将 PC 设置为 USB 主机和“设备”(在我的例子中是运
php - 带有 IIS 的 ASP.NET VS 带有 Apache 的 PHP
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 9
apache - 带有 Apache/Tomcat 多个 DNS 的反向代理，带有 SSL 的单个 Web 应用程序
我在设置服务器方面几乎是个新手，但遇到了一个问题。我有一个 Ubuntu 16.04 VPS 并安装了 Apache2 和 Tomcat7。我正在为 SSL 使用 LetsEncrypt 和 Cert
google-maps - 如何有效地将 Google Maps API V2(带有 MapFragment、Google Map、Marker 等)用于使用 API V1(带有 MapView、GeoPoint 等)的旧应用程序，
我在一个基于谷歌地图的项目上工作了超过 6 个月。我使用的是 Google Maps API V1 及其开发人员 API key 。当我尝试发布应用程序时，我了解到 Google API V1 已被弃
带有@property的Python对象来听写
我是 Python 的新手，所以如果我对一些简单的事情感到困惑，请原谅。我有一个这样的对象: class myObject(object): def __init__(self):
带有#号的javascript对象变量
这个问题已经有答案了: How can I access object properties containing special characters? (2 个回答) 已关闭 9 年前。我正在尝
带有@media的纵向模式和陆地模式的CSS
我有下面的 CSS。我想要的是一种流体/液体(因为缺乏正确的术语)css。我正在为移动设备开发，当我改变模式时从纵向 View 到陆地 View ，我希望它流畅。现在的图像在陆地 View 中效
带有@语法的python装饰器参数
我正在尝试使用可以接受参数的缓存属性装饰器。我查看了这个实现:http://www.daniweb.com/software-development/python/code/217241/a-cac
带有-1作为第三个参数的python双冒号
这个问题在这里已经有了答案: Understanding slicing (36 个答案) 关闭 6 年前。以a = [1,2,3,4,5]为例。根据我的直觉，我认为 a[::-1] 与 a[0:
带有--where子句的mysqldump不起作用
mysqldump -t -u root -p mytestdb mytable --where=datetime LIKE '2014-09%' 这就是我正在做的事情，它会返回: mysqldum
python - 带有 while 循环的销售税计算器
我正在制作销售税计算器，除了总支付金额部分外，其他一切都正常。在我的程序中，我希望能够输入一个数字并获得该项目的税额我还希望能够获得支付的总金额，包括交易中的税金。到目前为止，我编写的代码完成了所有这
带有-hwaccel_output_format的FFMPEG Hwaccel错误
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许在 Stack Overflow 上提出有关通用计算硬件和软件的问题。您可以编辑问题，使其成为
带有 Airflow 的电子邮件
我是否必须进行任何额外的设置才能让 apache-airflow 在任务失败时向我发送电子邮件。我的配置文件中有以下内容(与默认值保持不变): [email] email_backend = airf
excel - 带有 $ 的内置字符串函数
这个问题在这里已经有了答案: What does the $ symbol do in VBA? (5 个回答) 3年前关闭。使用返回字符串(如 Left)的内置函数有什么区别吗？或使用与 $ 相同
.net - 带有.NET库的VB6
我有一个用VB6编写的应用程序，我需要使用一个用.NET编写的库。有什么方法可以在我的应用程序上使用该库吗？谢谢最佳答案这取决于。您可以控制.NET库吗？如果是这样，则可以修改您的库，以便可以
raku - 带有 ^ 的类方法名称没有被正确调用
当我创建一个以 ^ 开头的类方法时，我尝试调用它，它给了我一个错误。 class C { method ^test () { "Hi" } } dd C.new.test; Too m
带有 Material 设计的Angularjs无法实例化模块ngMaterial
我已经使用 bower 安装了 angularjs 和 materialjs。凉亭安装 Angular Material 并将“ngMaterial”注入(inject)我的应用程序，但出现此错误。

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城