gpt4 book ai didi

python - lxml cssselect 解析

转载 作者:太空宇宙 更新时间:2023-11-04 13:57:14 26 4
gpt4 key购买 nike

我有一个包含以下数据的文档:

<div class="ds-list">
<b>1. </b>
A domesticated carnivorous mammal
<i>(Canis familiaris)</i>
related to the foxes and wolves and raised in a wide variety of breeds.
</div>

我想得到类中的所有内容 ds-list (没有 <b><i> 标签)。目前我的密码是doc.cssselect('div.ds-list') , 但所有这些都是在 <b> 之前的换行符.我怎样才能让它做我想做的事?

最佳答案

也许您正在寻找 text_content 方法?:

import lxml.html as lh
content='''\
<div class="ds-list">
<b>1. </b>
A domesticated carnivorous mammal
<i>(Canis familiaris)</i>
related to the foxes and wolves and raised in a wide variety of breeds.
</div>'''
doc=lh.fromstring(content)
for div in doc.cssselect('div.ds-list'):
print(div.text_content())

产量

1.  
A domesticated carnivorous mammal
(Canis familiaris)
related to the foxes and wolves and raised in a wide variety of breeds.

关于python - lxml cssselect 解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4909811/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com