gpt4 book ai didi

lxml - 从 XML 中的 etree 的单个元素获取文本

转载 作者:太空宇宙 更新时间:2023-11-04 06:25:33 25 4
gpt4 key购买 nike

下面的代码工作正常,但是没有任何 pythonic 方法来获得相同的功能吗?我只想解析 XML 并从多个元素(名称、name_status、url)中获取文本。

from lxml import etree
from urllib2 import urlopen

def ask_CoL(url):
tree = etree.parse(urlopen(url))
tn=[ el.get('total_number_of_results') for el in tree.iter('results') ]
try:
nr = int(tn[0])
except ValueError:
nr = 0
if nr == 1:
newstr = str([ el.text for el in tree.getiterator(tag='name')])\
.strip("[]'")+','\
+str([ el.text for el in tree.getiterator(tag='name_status')])\
.strip("[]'")+','\
+str([ el.text for el in tree.getiterator(tag='url')])\
.strip("[]'")+'\n'
else:
newstr = 'NA\n'
return newstr

示例 XML:

<results id="" name="Theragra chalcogramma" total_number_of_results="1" number_of_results_returned="1" start="0" error_message="" version="1.6 rev 1152">
<result>
<id>9037795</id>
<name>Theragra chalcogramma</name>
<rank>Species</rank>
<name_status>accepted name</name_status>
<online_resource>http://www.fishbase.org/Summary/SpeciesSummary.php?ID=318</online_resource>
<source_database>FishBase</source_database>
<source_database_url>http://www.fishbase.org</source_database_url>
<name_html><i>Theragra chalcogramma</i> (Pallas, 1814)</name_html>
<url>http://www.catalogueoflife.org/col/details/species/id/9037795</url>
</result>
</results>

最佳答案

您可以简化接口(interface)和实现:

import urllib2
from xml.etree import cElementTree as etree

def f(url):
tree = etree.parse(urllib2.urlopen(url))
el = tree.find('results')
if el is not None:
lst = [el.findtext(tag) or '' for tag in "name name_status url".split()]
return ','.join(lst)

关于lxml - 从 XML 中的 etree 的单个元素获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8612469/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com