gpt4 book ai didi

python - 了解如何使用 BeautifulSoup 进行网页抓取

转载 作者:太空宇宙 更新时间:2023-11-04 05:49:53 24 4
gpt4 key购买 nike

我正在尝试以“期间”和“每年百分比”(表 4)作为 URL 中的列来抓取表中的数据:

我的代码如下,但我想我对如何引用第一个日期和相应数字上方的行感到困惑,因此得到错误 AttributeError: 'NoneType' object has no attribute ' row_name = row.findNext('td.header_units').getText() 行中的 getText'

from bs4 import BeautifulSoup
import urllib2

url = "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y"

content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)

desired_table = soup.findAll('table')[4]

# Find the columns you want data from
headers1 = desired_table.findAll('td.header_units')
headers2 = desired_table.findAll('td.header')
desired_columns = []
for th in headers1: #I'm just working with `headers1` currently to see if I have the right idea
desired_columns.append([headers1.index(th), th.getText()])

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
cells = row.findAll('td')
row_name = row.findNext('td.header_units').getText()
for column in desired_columns:
print(cells[column[0]].text.encode('ascii', 'ignore'), row_name.encode('ascii', 'ignore'), column[1].encode('ascii', 'ignore'))

谢谢

最佳答案

这会将所有元素成对放入元组中:

from bs4 import BeautifulSoup
import requests

r = requests.get(
"http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y")
soup = BeautifulSoup(r.content)

data = iter(soup.find("table", {"class": "tablestats"}).find("td", {"class": "header"}).find_all_next("tr"))


headers = (next(data).text, next(data).text)
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]

for a, b in table_items:
print(u"Period={}, Percent per annum={}".format(a, b if b.strip() else "null"))

输出:

Period=2015-06-09, Percent per annum=1.842026
Period=2015-06-08, Percent per annum=1.741636
Period=2015-06-07, Percent per annum=null
Period=2015-06-06, Percent per annum=null
Period=2015-06-05, Percent per annum=1.700042
Period=2015-06-04, Percent per annum=1.667431

关于python - 了解如何使用 BeautifulSoup 进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30761041/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com