gpt4 book ai didi

python - 如何在 Python Beautiful Soup 中获取没有唯一元素的特定文本信息?

转载 作者:行者123 更新时间:2023-12-04 09:46:16 24 4
gpt4 key购买 nike

我只想从本网站获取空缺职位文本:https://www.praeses.com/careers/ .我复制并粘贴了这个类,它从网站的大部分地方提取文本,因为几乎所有东西都使用这个类,但没有其他独特的数据可供提取。我如何获得空缺职位?我基本上得到了“一个类(class)”的一切。

<a class="et_pb_button et_pb_custom_button_icon et_pb_button_1 et_hover_enabled et_pb_bg_layout_dark" href="https://www.praeses.com/senior-national-accounts-manager/" data-icon="5">Senior National Accounts Manager</a>

import requests
from bs4 import BeautifulSoup

print("Praeses jobs:")
praeses_url = "https://www.praeses.com/careers/"
praeses_html_text = requests.get(praeses_url).text
praeses_soup = BeautifulSoup(praeses_html_text, 'html.parser')
# print(praeses_soup)
for job in praeses_soup.find_all('et_pb_button et_pb_custom_button_icon et_pb_button_1 et_hover_enabled et_pb_bg_layout_dark'):
print(praeses_soup.text)

最佳答案

您可以为任务使用 CSS 选择器。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://www.praeses.com/careers/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for a in soup.select('div:contains("Open Positions") ~ div > a'):
print('{:<40}{}'.format(a.get_text(strip=True), a['href']))

打印:
Senior National Accounts Manager        https://www.praeses.com/senior-national-accounts-manager/
National Accounts Manager https://www.praeses.com/national-accounts-manager/
Cloud Architect https://www.praeses.com/cloud-architect/
Front-End Developer https://www.praeses.com/front-end-developer/
Senior Project Manager (GOV) https://www.praeses.com/senior-project-manager-gov/

关于python - 如何在 Python Beautiful Soup 中获取没有唯一元素的特定文本信息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62091080/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com