gpt4 book ai didi

python - 如何有条件地从带有 ElementTree 的 XML 中获取值?

转载 作者:行者123 更新时间:2023-12-05 04:36:51 25 4
gpt4 key购买 nike

我有 XML,我需要提取一些值,如果这些值不存在,我想要“N/A”。

<?xml version="1.0" encoding="utf-8"?>
<DEF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FF.xsd">
<FOLD SERV="TRYING" VERSION="918" PLATFORM="UNIX" GROUP_NAME="UNIX" MODIFIED="False" LAST_UPLOAD="20220117145134UTC" TYPE="1" USED_BY_CODE="0">
<JOB JOBID="835" APP="TRY" JOBNAME="JOBA" DESC="Extrac607" TYPE="db" FORM="Databases" CM_VER="N/A" MULTY_AGENT="N" VERSION_SERIAL="1" PARENT_FOLDER="UNIX">
<VARIABLE NAME="%%TYPE_DB" VALUE="Open Query" />
<VARIABLE NAME="%%APPLOG" VALUE="N" />
<VARIABLE NAME="%%ACCOUNT" VALUE="ONLINE" />
<VARIABLE NAME="%%TYPE" VALUE="Oracle" />
<VARIABLE NAME="%%VERS" VALUE="11g" />
<VARIABLE NAME="%%LENGHT" VALUE="16" />
</JOB>
<JOB JOBID="839" APP="TRY" JOBNAME="JOBB" DESC="Extrac617" TYPE="db" FORM="Databases" CM_VER="N/A" MULTY_AGENT="N" VERSION_SERIAL="1" PARENT_FOLDER="UNIX">
<VARIABLE NAME="%%TYPEDB" VALUE="Open Query" />
<VARIABLE NAME="%%ACCOUNT" VALUE="ONLINE" />
<VARIABLE NAME="%%TYPE" VALUE="Oracle" />
<VARIABLE NAME="%%VERS" VALUE="11g" />
<VARIABLE NAME="%%LENGHT" VALUE="16" />
</JOB>
</FOLD>
</DEF>

这是我的代码:

from xml.etree import ElementTree
from contextlib import redirect_stdout
from collections import Counter

import xlsxwriter
import pprint
import os
import datetime

begin_time = datetime.datetime.now()
print(datetime.datetime.now())

###VARIABLES RUTASL
fileXML= 'C:\\xxxx\\completa.xml'
cont=0
path= 'C:\\xxxxxx\\todo.csv'
outputExcel = 'C:\\xxxxxxx\\salida.xlsx'
file2 = 'C:\\xxxxxxxx\\todo.csv'




try:
os.remove(path)
except OSError as e: ## if failed, report it back to the user ##
print ("Error: %s - %s." % (e.filename, e.strerror))

try:
os.remove(outputExcel)
except OSError as e: ## if failed, report it back to the user ##
print ("Error: %s - %s." % (e.filename, e.strerror))




with open(fileXML, encoding="utf8") as f:
tree = ElementTree.parse(f)


# using getchildren() within root and check if tag starts with keyword
#print [node.text for node in root.getchildren() if node.tag.startswith('%%FTP-LPATH')]
root = tree.getroot()
set_tipos= set()
lista_tipos = list()

for node in tree.iter('JOB'):


name = node.attrib.get('JOBNAME')
appl_type = node.attrib.get('TYPE')
appl_form = node.attrib.get('FORM')


if appl_form:
set_tipos.add(appl_form)
lista_tipos.append(appl_form)
else:
set_tipos.add(appl_type)
lista_tipos.append(appl_type)



print('#####################################')


cuenta1 = Counter(lista_tipos)
print(cuenta1)

workbook = xlsxwriter.Workbook(outputExcel)
worksheet = workbook.add_worksheet('Resumen')
bold =workbook.add_format({'bold':1})
centrado =workbook.add_format({'center_across':1})
centrado = workbook.add_format({'bold': True, 'center_across': True})

headings = list()
datos = list()
listaBD = list()
pointDB=1

for key, value in cuenta1.items():
# scores = 0
print(key)
headings.append(key)
datos.append(value)



if 'Databases' in headings:
print('Existe base de datos')

for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"):
listaBD.clear()
name = node4.attrib.get('JOBNAME')
appl_type = node4.attrib.get('TYPE')
listaBD.append(name)
listaBD.append(appl_type)
for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%TYPE_DB']"):
a=0

value = node4.attrib.get('VALUE', 'N/A')
listaBD.append(value)


worksheet.write_row('A'+str(pointDB),listaBD)
pointDB += 1


node4.clear()

else:
print('Don't exists')

print(datetime.datetime.now() - begin_time)

workbook.close()

...................

这行得通,但我在每种情况下都有 6 个 if(s),而且程序非常慢。我已经编辑了主要代码。看来问题是 xml 的大小。 XML 有 50 MB。输出是: Output我还能如何有条件地获取这些值?

最佳答案

我看到至少有两件事可以更改以删除 if block 。

更具体的 X 路径

我看到这个“如果类型匹配就做所有事情,否则跳过”分支:

appl_type = node4.attrib.get('TYPE')
if appl_type == 'db':

您可以通过将对 @TYPE 的检查放入您的顶级 X-Path 来避免这种情况:

for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"):

默认 setter/getter

您将获得一个属性值并根据它是否为 None 来决定要做什么:

if node4.attrib.get('VALUE') is None:
print("don't exists")
listaBD.append('N/A')
else:
print("Existe")
listaBD.append(node4.attrib.get('VALUE'))

get() 方法中使用 默认值 如果它是 None 并避免检查:

value = node4.attrib.get('VALUE', 'N/A')
listaBD.append(value)

杂项

这不会删除 if block ,但会重组并使代码更容易理解:

if 'DB' not in headings:
quit() # or `return` from a function/method... don't go on

print('DB Exists')

不删除 if,但现在所有内容都没有嵌套在它下面。

这是我对你的原始代码的完整编辑

from xml.etree import ElementTree as ET

tree = ET.parse('input.xml')
headings = ['DB']
listaBD = []

if 'DB' not in headings:
quit() # or `return` from a function/method... don't go on

print('DB Exists')

for node4 in tree.iterfind(".//JOB[@TYPE = 'db']"):
print('//////////////////////////')
print(node4.tag, node4.attrib)
name = node4.attrib.get('JOBNAME')
appl_type = node4.attrib.get('TYPE')

print(name + " " + appl_type)
for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%TYPE_DB']"):
print('TIPO')
print(node4.attrib.get('VALUE'))

value = node4.attrib.get('VALUE', 'N/A')
listaBD.append(value)

node4.clear()
for node4 in tree.iterfind(".//JOB/VARIABLE[@NAME='%%DBSCHEMA']"):
print(node4)

关于python - 如何有条件地从带有 ElementTree 的 XML 中获取值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70775545/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com