python - 根据 Beautifulsoup 中的内容排除标签-6ren

python - 根据 Beautifulsoup 中的内容排除标签

转载作者：行者123 更新时间：2023-11-28 21:09:35

25

4

我正在抓取类似于以下内容的 html 数据:

<div class="target-content">
    <p id="random1">
      "the content of the p"
    </p>

    <p id="random2">
      "the content of the p"
    </p>

    <p>
      <q class="semi-predictable">
         "q tag content that I don't want
      </q>
    </p>

    <p id="random3">
      "the content of the p"
    </p>

</div>

我的目标是获得所有 标签及其内容——同时能够排除 <q>标签，以及它的内容。目前，我得到了所有 使用以下方法标记:

contentlist = soup.find('div', class_='target-content').find_all('p')

我的问题是，在我找到所有  的结果集之后标签，如何过滤掉单个 及其内容，其中包含 <q> ？

注意:从soup.find('div', class_='target-content')find_all('p')得到结果集后，我正在迭代添加每个 按以下方式从结果集到列表:

content = ''
    for p in contentlist:
        content += str(p)

最佳答案

你可以跳过 p 标签里面有 q 标签:

for p in soup.select('div.target-content > p'):
    if p.q:  # if q is present - skip
        continue
    print(p)

其中 p.q 是 p.find("q") 的快捷方式。 div.target-content > p 是 CSS selector这将匹配所有 p 标签，这些标签是 div 元素的直接子元素，具有 target-content 类。

关于python - 根据 Beautifulsoup 中的内容排除标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38058262/

25

4

0

文章推荐： spring - 从 JUnit 测试访问 spring 上下文

文章推荐： javascript - MVC Razor 应用程序中的 SignalR javascript

文章推荐： ruby - 如何使用页面对象迭代一系列复选框？

文章推荐： Javascript RegExp 根据内容更改行

首页

博学

6Ren·AI

商城