Python - BeautifulSoup4 decompose() 不起作用-6ren

Python - BeautifulSoup4 decompose() 不起作用

转载作者：太空宇宙更新时间：2023-11-03 18:16:09

24

4

我正在尝试从此页面获取所有标题的类别。

from bs4 import BeautifulSoup
import urllib2

headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) \
         AppleWebKit/537.36 (KHTML, like Gecko) \
         Ubuntu Chromium/33.0.1750.152 Chrome/33.0.1750.152 Safari/537.36'
}
category_url = ''
html = urllib2.urlopen(urllib2.Request(category_url, None, headers)).read()
page = BeautifulSoup(html)
results = page.find('div', {'class': "results"}).find_all('li')

for res in results:
    category = res.find(attrs={'class': "category"}) or res.find(attrs={'class': "categories"})
    #print category  #till here, I'm getting correct data
    print category.b.decompose() #here is the problem? I should get the div element without <b> tag but it returns None

我得到的是 None 而不是更新的 dom。

PS:如果您有任何改进此代码的建议，请告诉我。我很乐意进行更改以获得更好的性能和 pythonic 代码。

最佳答案

Decompose 从树中删除标签，并返回 None，而不是剩余的树。这与 list.append 和 list.sort 的工作方式类似。 (这些方法还会修改调用者并返回 None。)

for res in results:
    category = res.find(attrs={'class': "category"}) or res.find(attrs={'class': "categories"})
    category.b.decompose()
    print(category)

产生类似的输出

<div class="categories">

<span class="highlighted">Advertising</span> <span class="highlighted">Agencies</span> </div>

<小时/>

使用lxml:

import lxml.html as LH
import urllib2

category_url = 'http://www.localsearch.ae/en/category/Advertising-Agencies/1013'
doc = LH.parse(urllib2.urlopen(category_url))    
for category in doc.xpath(
    '//div[@class="category"]|//div[@class="categories"]'):
    b = category.find('b')
    category.remove(b)
    print(LH.tostring(category))

关于Python - BeautifulSoup4 decompose() 不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24971962/

24

4

0

文章推荐： python - 祖先的 NDB 查询返回的不是实体

文章推荐： c# - 两个列表按升序排序另一个按降序但受约束

文章推荐： c# - ASCII字符转换为C#中的十六进制值

文章推荐： python - 将多个 json 文件中的选择字段保存到新的 json

实例分析Try {} Catch{} 作用
今天有小伙伴给我留言问到，try{...}catch(){...}是什么意思？它用来干什么？简单的说他们是用来捕获异常的下面我们通过一个例子来详细讲解下
html - 列表社交媒体链接的 ARIA 作用
我正在努力提高网站的可访问性，但我不知道如何在页脚中标记社交媒体链接列表。这些链接指向我在 facecook、twitter 等上的帐户。我不想用 role="navigation" 标记这些链接，因
java.util.Timer SystemTime 作用？
说现在是 6 点，我有一个 Timer 并在 10 点安排了一个 TimerTask。之后，System DateTime 被其他服务(例如 ntp)调整为 9 点钟。我仍然希望我的 TimerTas
php - 什么是 Doctrine hydration 作用？
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用资料或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
python入门:argparse浅析 nargs='+'作用
我就废话不多说了，大家还是直接看代码吧~ ? 1
Maven是什么?Maven的概念+作用+仓库的介绍+常用命令的详解
Maven系列1 1.什么是Maven？ Maven是一个项目管理工具，它包含了一个对象模型。一组标准集合，一个依赖管理系统。和用来运行定义在生命周期阶段中插件目标和逻辑。核心功能 Mav

首页

博学

6Ren·AI

商城

Python - BeautifulSoup4 decompose() 不起作用