python - 查找具有特定属性值的所有标签-6ren

python - 查找具有特定属性值的所有标签

转载作者：太空宇宙更新时间：2023-11-04 01:42:17

25

4

如何遍历所有具有特定属性和特定值的标签？例如，假设我们只需要数据 1、数据 2 等。

<html>
    <body>
        <invalid html here/>
        <dont care> ... </dont care>
        <invalid html here too/>
        <interesting attrib1="naah, it is not this"> ... </interesting tag>
        <interesting attrib1="yes, this is what we want">
            <group>
                <line>
                    data
                </line>
            </group>
            <group>
                <line>
                    data1
                <line>
            </group>
            <group>
                <line>
                    data2
                <line>
            </group>
        </interesting>
    </body>
</html>

我尝试了 BeautifulSoup，但它无法解析文件。不过，lxml 的解析器似乎可以工作:

broken_html = get_sanitized_data(SITE)

parser = etree.HTMLParser()
tree = etree.parse(StringIO(broken_html), parser)

result = etree.tostring(tree.getroot(), pretty_print=True, method="html")

print(result)

我不熟悉它的 API，我不知道如何使用 getiterator 或 xpath。

最佳答案

这是一种方法，使用 lxml 和 XPath 'descendant::*[@attrib1="是的，这就是我们想要的"]'。 XPath 告诉 lxml 查看当前节点的所有后代，并返回那些 attrib1 属性等于 “是的，这就是我们想要的”。

import lxml.html as lh 
import cStringIO

content='''
<html>
    <body>
        <invalid html here/>
        <dont care> ... </dont care>
        <invalid html here too/>
        <interesting attrib1="naah, it is not this"> ... </interesting tag>
        <interesting attrib1="yes, this is what we want">
            <group>
                <line>
                    data
                </line>
            </group>
            <group>
                <line>
                    data1
                <line>
            </group>
            <group>
                <line>
                    data2
                <line>
            </group>
        </interesting>
    </body>
</html>
'''
doc=lh.parse(cStringIO.StringIO(content))
tags=doc.xpath('descendant::*[@attrib1="yes, this is what we want"]')
print(tags)
# [<Element interesting at b767e14c>]
for tag in tags:
    print(lh.tostring(tag))
# <interesting attrib1="yes, this is what we want"><group><line>
#                     data
#                 </line></group><group><line>
#                     data1
#                 <line></line></line></group><group><line>
#                     data2
#                 <line></line></line></group></interesting>

关于python - 查找具有特定属性值的所有标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3778512/

25

4

0

文章推荐： node.js - Mongoose 返回空 JSON 数组

文章推荐： php - HTML/PHP 计时表时髦的布局

文章推荐： python - 在模型中存储动态表单

首页

博学

6Ren·AI

商城

python - 查找具有特定属性值的所有标签

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内

首页

博学

6Ren·AI

商城

python - 查找具有特定属性值的所有标签

标签)？ 根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？ 是吗 stackoverflow 或 stackoverflow 谢谢 最佳答案 根据网络标准，您不能将 block 元素放入内

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内