gpt4 book ai didi

python - 使用 Python 和 Beautiful Soup 解析 HTML

转载 作者:可可西里 更新时间:2023-11-01 13:16:54 26 4
gpt4 key购买 nike

<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>

我正在使用 Beautiful Soup 在 HTML 代码中达到这一点。我现在想搜索代码,并提取 2010 年 1 月、阿拉斯加、所有者和 Mad Dog Graph 等数据。所有这些数据都具有相同的类,但它们具有不同的变量,例如“Member Since”、“AIGA Chapter”等。如何搜索 Member Since 并获得 January 2010。对其他 3 个字段执行相同的操作?

最佳答案

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
... ''')
>>> for row in soup.findAll('div', {'class':'profile-row clearfix'}):
... field, value = row.findAll(text = True)
... print field, value
...
Member Since January 2010
AIGA Chapter Alaska
Title Owner
Company Mad Dog Graphx

当然,您可以使用 fieldvalue 做任何您想做的事情,例如用它们创建字典或将它们存储在数据库中。

如果“profile-row clearfix”div 中还有其他 div 或其他文本节点,您需要执行类似 field = row.find('div', {'class':'profile -row-header'}).findAll(text=True)

关于python - 使用 Python 和 Beautiful Soup 解析 HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6566000/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com