gpt4 book ai didi

python - BeautifulSoup 和 Python 围绕 js 标签进行抓取,也许吧?

转载 作者:太空宇宙 更新时间:2023-11-04 00:42:43 26 4
gpt4 key购买 nike

尝试获取多个标题、链接和日期。只能拿到第一个。不确定为什么 BS4 不能获取所有项目...这是 JavaScript 问题吗?

from bs4 import BeautifulSoup
from urllib import urlopen

html = urlopen("http://www.fiercepharma.com/news")
soup = BeautifulSoup(html.read().decode('utf-8'),"lxml")
main_div = soup.select_one("div#content")
div_sub = main_div.select("div.region.region-content")

for d in div_sub:
date = d.time.get_text()
headline = d.h2.a.get_text()
url = d.a["href"]
print headline, url, date

最佳答案

如何使用以下内容来捕获主页上包含链接、作者、发布日期的所有文章。您可以将其存储在字典中,或将其存储在 pandas 数据框中以便于操作。

from bs4 import BeautifulSoup
import requests

baseurl = 'http://www.fiercepharma.com'
response = requests.get(baseurl)

soup = BeautifulSoup(response.content)

cdict = {}

for group in soup.find_all('div', {'class' : 'card horizontal views-row'}):
try:
title = group.find('h2', {'class' : 'field-content list-title'}).text
link = baseurl + group.find('h2', {'class' : 'field-content list-title'}).find('a', href=True)['href']
author = group.find('span', {'class' : 'field-content'}).find('a').text
time = group.find('span', {'class' : 'field-content'}).find('time').text
content = group.find('p', {'class' : 'field-content card-text'}).text
cdict[link] = {'title' : title, 'author' : author, 'time' : time, 'content' : content}
except AttributeError as e:
print('[-] Unable to parse {}'.format(e))

print(cdict)
#{'http://www.fiercepharma.com/manufacturing/lonza-bulks-up-5-5b-deal-for-capsugel': {'author': u'Eric Palmer',
# 'content': u'Swiss CDMO Lonza has pulled the trigger on a $5.5 billion deal to acquire the U.S.-based contract capsule and drug producer Capsugel to create another sizable\u2026',
# 'time': u'Dec 15, 2016 8:45am',
# 'title': u'Lonza bulks up with $5.5B deal for Capsugel'},

关于python - BeautifulSoup 和 Python 围绕 js 标签进行抓取,也许吧?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41191937/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com