gpt4 book ai didi

python - 使用 python 和 beautifulsoup 抓取联系信息

转载 作者:太空宇宙 更新时间:2023-11-03 20:40:01 25 4
gpt4 key购买 nike

我正在尝试从页面中获取联系信息。我需要姓名、职位、电话和电子邮件地址。

我正在学习 Python 并尝试根据我知道的数据编写代码。我能够拉出包含各个联系人的 div block ,但我不知道在获得它们后如何爬行它们。

tags = soup.find_all('div', attrs={'class':'tshowcase-inner-box'})

但后来我想爬过子级 div,但没有成功。

    fullname = soup.find('div', attrs={'class':'tshowcase-box-title'})
title = soup('div', attrs={'class':'tshowcase-single-position'})
phone = soup('div', attrs={'class':'tshowcase-single-telephone'})
email = soup('div', attrs={'class':'tshowcase-box-social'})

我不确定接下来会发生什么,并且感谢任何指点。

这是示例 HTML:

<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>

最佳答案

如果您对每个列表进行循环,您可以测试是否存在并采取相应的操作

from bs4 import BeautifulSoup as bs
import requests

html = '''
<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>
<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME2</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE2</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE2</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL2" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>
'''
soup = bs(html, 'lxml')
results = []

for listing in soup.select('.tshowcase-inner-box'):
name = listing.select_one('.tshowcase-box-title')
job = listing.select_one('.tshowcase-single-position')
tel = listing.select_one('.tshowcase-single-telephone')
email = listing.select_one('[href^=mailto]')
if name is None:
name = 'Not present'
else:
name = name.text
if job is None:
job = 'Not present'
else:
job = job.text
if tel is None:
tel = 'Not present'
else:
tel = tel.text
if email is None:
email = 'Not present'
else:
email = email['href'].replace('mailto:','')
results.append({ 'name' : name, 'job' : job, 'tel': tel, 'email': email })
print(results)

关于python - 使用 python 和 beautifulsoup 抓取联系信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56918360/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com