gpt4 book ai didi

python - 使用 BeautifulSoup 和 Python 抓取 Javascript 网页

转载 作者:行者123 更新时间:2023-12-01 03:16:56 25 4
gpt4 key购买 nike

我想从网站http://www.jobs.ch抓取内容。结果应该是一个脚本,我可以在其中指定工作术语,例如业务分析师并获得标题中包含该名称的所有工作。我认为我应该使用多步骤方法,首先从每个页面收集所有匹配的链接,存储它们,然后提取职位描述。

这是实现这一目标的可能方法吗?或者我是否还需要使用selenium,因为该网站位于react.js 中?

这是我的脚本的开始:

from bs4 import BeautifulSoup
import urllib2

jobsFile = urllib2.urlopen("http://www.jobs.ch/en/vacancies/?term=business+analyst")
jobsHtml = jobsFile.read()
jobsFile.close()

soup = BeautifulSoup(jobsHtml)
jobsAll = soup.find_all("a")
for links in soup.find_all('a'):
print (links.get('href'))

控制台的输出:

python jobplatform.py
/Library/Python/2.7/site-packages/bs4/__init__.py:181: UserWarning:
No parser was explicitly specified, so I'm using the best available
HTML parser for this system ("lxml"). This usually isn't a problem,
but if you run this code on another system, or in a different virtual
environment, it may use a different parser and behave differently.

The code that caused this warning is on line 8 of the file
jobplatform.py. To get rid of this warning, change code that looks
like this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))
None
/en/
/en/login/
/en/register/
/en/vacancies/
/en/companies/
http://www.jobs.ch/en/sucheBerater.php
http://www.jobs.ch/en/tipps
http://www.jobs.ch/en/ecom/
/de/stellenangebote/?term=business
/fr/offres-emplois/?term=business
/en/vacancies/?term=business
/en/vacancies/
None
None
None
None
/en/vacancies/?page=1&term=business&web-results=1
None
None
/en/companies/79912-bayer-business-services-gmbh/
/en/vacancies/detail/7376115/?source=vacancy_search
/en/companies/79912-bayer-business-services-gmbh/
/en/companies/48196-kotra-korea-business-center/
/en/vacancies/detail/7397077/?source=vacancy_search
/en/companies/48196-kotra-korea-business-center/
/en/companies/66172-diwisa-distillerie-willisau-sa/
/en/vacancies/detail/7363589/?source=vacancy_search
/en/companies/66172-diwisa-distillerie-willisau-sa/
/en/companies/2859-paul-scherrer-institut/
/en/vacancies/detail/7359642/?source=vacancy_search
/en/companies/2859-paul-scherrer-institut/
/en/companies/49314-pit-offices-gmbh/
/en/vacancies/detail/7344672/?source=vacancy_search
/en/companies/49314-pit-offices-gmbh/
/en/companies/27786-zuehlke-engineering-ag/
/en/vacancies/detail/7176356/?source=vacancy_search
/en/companies/27786-zuehlke-engineering-ag/
/en/companies/1802-six-payment-services-ag/
/en/vacancies/detail/7396870/?source=vacancy_search
/en/companies/1802-six-payment-services-ag/
/en/companies/49420-mettler-toledo-gruppe/
/en/vacancies/detail/7384998/?source=vacancy_search
/en/companies/49420-mettler-toledo-gruppe/
/en/companies/16414-partners-group/
/en/vacancies/detail/7279253/?source=vacancy_search
/en/companies/16414-partners-group/
/en/companies/4005-johnson-johnson/
/en/vacancies/detail/7397184/?source=vacancy_search
/en/companies/4005-johnson-johnson/
/en/companies/44340-amgen/
/en/vacancies/detail/7359993/?source=vacancy_search
/en/companies/44340-amgen/
/en/companies/1802-six-payment-services-ag/
/en/vacancies/detail/7357631/?source=vacancy_search
/en/companies/1802-six-payment-services-ag/
/en/companies/16649-fritschi-unternehmensberatung-gmbh/
/en/vacancies/detail/7369054/?source=vacancy_search
/en/companies/16649-fritschi-unternehmensberatung-gmbh/
/en/companies/19002-hays-schweiz-ag/
/en/vacancies/detail/7389632/?source=vacancy_search
/en/companies/19002-hays-schweiz-ag/
/en/companies/5977-canon-schweiz-ag/
/en/vacancies/detail/7236919/?source=vacancy_search
/en/companies/5977-canon-schweiz-ag/
/en/companies/40039-vorwerk-international-strecker-co/
/en/vacancies/detail/7374142/?source=vacancy_search
/en/companies/40039-vorwerk-international-strecker-co/
/en/companies/2263-zuercher-kantonalbank/
/en/vacancies/detail/7299359/?source=vacancy_search
/en/companies/2263-zuercher-kantonalbank/
/en/companies/10673-accenture/
/en/vacancies/detail/6664788/?source=vacancy_search
/en/companies/10673-accenture/
/en/companies/38308-addexpert-gmbh/
/en/vacancies/detail/7386047/?source=vacancy_search
/en/companies/38308-addexpert-gmbh/
/en/companies/1802-six-swiss-exchange-ag/
/en/vacancies/detail/7357633/?source=vacancy_search
/en/companies/1802-six-swiss-exchange-ag/
/en/vacancies/?page=1&term=business
/en/vacancies/?page=2&term=business
/en/vacancies/?page=3&term=business
/en/vacancies/?page=4&term=business
/en/vacancies/?page=5&term=business
/en/vacancies/?page=6&term=business
/en/vacancies/?page=124&term=business
/en/vacancies/?page=2&term=business
None
http://jobcloud.ch/c/en/about/
http://jobcloud.ch/c/en/about/
http://jobcloud.ch/c/en/about/team/
http://jobcloud.ch/c/en/we-are-jobcloud/
None
http://www.jobs.ch/en/newest.php
http://www.jobs.ch/en/info.php?info=agb
http://www.jobs.ch/en/info.php?info=pp
None
http://jobcloud.ch/c/en/products/international-recruiting/
/en/
http://www.jobs.ch/en/sitemap.php
http://jobcloud.ch/c/en/about/contact/
http://jobcloud.ch/
http://www.facebook.com/jobs.ch
http://twitter.com/jobs_ch
http://www.xing.com/company/jobcloudag
http://www.youtube.com/jobspunktch
http://plus.google.com/113239437813300663024/
http://www.flickr.com/photos/jobsag

最佳答案

正如 @Teemu Risikko 评论中所述,您可以使用 dryscrape 或 selenium。这是使用 dryscrape 的解决方案:

from bs4 import BeautifulSoup
import dryscrape

my_url = "http://www.jobs.ch/en/vacancies/?term=business+analyst"
session = dryscrape.Session()
session.visit(my_url)
response = session.body()
soup = BeautifulSoup(response)
jobsAll = soup.find_all("a")
for links in soup.find_all('a'):
print (links.get('href'))

使用 dryscrape 的解决方案非常简单,但安装软件包可能很棘手(使用 qt <=55)...

关于python - 使用 BeautifulSoup 和 Python 抓取 Javascript 网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42394621/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com