gpt4 book ai didi

Python:当子属性满足条件时提取XML元素值

转载 作者:太空宇宙 更新时间:2023-11-03 17:23:26 24 4
gpt4 key购买 nike

我是 XML 解析的初学者当子属性满足某些条件时,我很难提取特定值。

这是我的 xml 文件的示例(来自 http://www.uniprot.org/uniprot/Q63HN8.xml ):

<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd">
<entry dataset="Swiss-Prot" created="2007-02-20" modified="2015-09-16" version="112">
<accession>Q63HN8</accession>
<accession>C9JCP4</accession>
<accession>D6RI12</accession>
<dbReference type="Proteomes" id="UP000005640">
<property type="component" value="Chromosome 17"/>
</dbReference>
<dbReference type="Bgee" id="Q63HN8"/>
<dbReference type="CleanEx" id="HS_KIAA1618"/>
<dbReference type="ExpressionAtlas" id="Q63HN8">
<property type="expression patterns" value="baseline and differential"/>
</dbReference>
<dbReference type="GO" id="GO:0005737">
<property type="term" value="C:cytoplasm"/>
<property type="evidence" value="ECO:0000314"/>
<property type="project" value="UniProtKB"/>
</dbReference>
<dbReference type="GO" id="GO:0016020">
<property type="term" value="C:membrane"/>
<property type="evidence" value="ECO:0000314"/>
<property type="project" value="UniProtKB"/>
</dbReference>
<dbReference type="GO" id="GO:0016887">
<property type="term" value="F:ATPase activity"/>
<property type="evidence" value="ECO:0000314"/>
<property type="project" value="UniProtKB"/>
</dbReference>
<dbReference type="GO" id="GO:0016874">
<property type="term" value="F:ligase activity"/>
<property type="evidence" value="ECO:0000501"/>
<property type="project" value="UniProtKB-KW"/>
</dbReference>

当 property 属性中的“value”以“C:”开头时,我想提取 dbReference 中的“id”值所以预期的输出是:“转到:0005737”“GO:0016020”

这是到目前为止我的脚本:

import urllib2
from lxml import etree

file = urllib2.urlopen('http://www.uniprot.org/uniprot/Q63HN8.xml')
tree = etree.parse(file)
root = tree.getroot()
for node in tree.iter('{http://uniprot.org/uniprot}dbReference'):
if node.attrib.get('type') == 'GO':
value = node.attrib.get('value');
print value
if value.str.startswith('C:'):
goterm = node.attrib.get('id')
print goterm

但距离工作还很远。

编辑

此外,如何将不同搜索的值存储到列表中?预期的:goterm_when_C = ['GO:0005737', 'GO:0016020', 'GO:0005730']goterm_when_F = ['GO:0016887', 'GO:0016874', 'GO:0004842', 'GO:0008270']当我尝试时:

goterm_when_C = []
goterm_when_F = []
if value.startswith('C:'):
go_location = node.attrib.get('id')
for item in go_location:
goterm_when_C.append(item)
if value.startswith('F:'):
go_function = node.attrib.get('id')
for item in go_function:
goterm_when_F.append(item)
break

我明白了

>>> goterm_when_C
['G', 'O', ':', '0', '0', '0', '5', '7', '3', '7', 'G', 'O', ':', '0', '0', '1', '6', '0', '2', '0', 'G', 'O', ':', '0', '0', '0', '5', '7', '3', '0']

任何帮助将不胜感激

最佳答案

您需要迭代子节点,然后检查其属性。示例-

for node in tree.iter('{http://uniprot.org/uniprot}dbReference'):
if node.attrib.get('type') == 'GO':
for child in node:
value = child.attrib.get('value');
print value
if value.startswith('C:'):
goterm = node.attrib.get('id')
print goterm
break

关于Python:当子属性满足条件时提取XML元素值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32880639/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com