gpt4 book ai didi

python - 避免 lxml 中外部元素换行

转载 作者:行者123 更新时间:2023-12-01 03:49:29 30 4
gpt4 key购买 nike

>>> from lxml import html
>>> html.tostring(html.fromstring('<div>1</div><div>2</div>'))
'<div><div>1</div><div>2</div></div>' # I dont want to outer <div>
>>> html.tostring(html.fromstring('I am pure text'))
'<p>I am pure text</p>' # I dont need the extra <p>

如何避免外<div><p>在 lxml 中?

最佳答案

默认情况下,lxml will create a parent div when the string contains multiple elements .

您可以使用单独的片段:

from lxml import html
test_cases = ['<div>1</div><div>2</div>', 'I am pure text']
for test_case in test_cases:
fragments = html.fragments_fromstring(test_case)
print(fragments)
output = ''
for fragment in fragments:
if isinstance(fragment, str):
output += fragment
else:
output += html.tostring(fragment).decode('UTF-8')
print(output)

输出:

[<Element div at 0x3403ea8>, <Element div at 0x3489368>]
<div>1</div><div>2</div>
['I am pure text']
I am pure text

关于python - 避免 lxml 中外部元素换行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38471001/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com