gpt4 book ai didi

javascript - 使用 beautiful soup 4 抓取天气数据(网站是用 javascript 编码的)

转载 作者:行者123 更新时间:2023-12-03 01:17:16 30 4
gpt4 key购买 nike

我正在尝试使用 beautifulsoup 4 从 wunderground.com 抓取一些天气数据。我找到了有关如何执行此操作的教程,但它展示了如何使用 HTML 源代码执行此操作。制作教程时,Wunderground.com 原来是 HTML 格式,但现在是 js 格式。

我能够获取代码并对其进行操作以满足我的特定数据检索需求,但我对如何让它拉动 javascript 而不是 HTML 感到困惑。谁能帮忙解决这个问题吗?

代码如下,我从 YouTube 上 SAS Business Analytics 的 kiengiv 获取它。

from bs4 import BeautifulSoup
import urllib3, csv, os, datetime, urllib3.request, re, sys

for vYear in range(2016, 2019):
for vMonth in range(1, 13):
for vDay in range(1, 32):
# go to the next month, if it is a leap year and greater than the 29th or if it is not a leap year
# and greater than the 28th
if vYear % 4 == 0:
if vMonth == 2 and vDay > 29:
break
else:
if vMonth == 2 and vDay > 28:
break
# go to the next month, if it is april, june, september or november and greater than the 30th
if vMonth in [4, 6, 9, 11] and vDay > 30:
break

# defining the date string to export and go to the next day using the url
theDate = str(vYear) + "/" + str(vMonth) + "/" + str(vDay)

# the new url created after each day
theurl = "https://www.wunderground.com/history/daily/us/ma/cambridge/KBOS/" + theDate + "date.html"
# extract the source data for analysis
http = urllib3.PoolManager()
thepage = http.request('GET', theurl)
soup = BeautifulSoup(thepage, "html.parser")
MaxWindSpeed = Visibility = SeaLevelPressure = Precipitation = High_Temp = Low_Temp = Day_Average_Temp = "N/A"
for temp in soup.find_all('tr'):
if temp.text.strip().replace('\n', '')[:6] == 'Actual' or temp.text.strip().replace('\n', '')[-6:] == "Record":
pass
elif temp.text.replace('\n', '')[-7:] == "RiseSet":
break
elif temp.find_all('td')[0].text == "Day Average Temp":
if temp.find_all('td')[1].text.strip() == "-":
Mean = "N/A"
else:
Mean = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "High Temp":
if temp.find_all('td')[1].text.strip() == "-":
Max = "N/A"
else:
Max = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Low Temp":
if temp.find_all('td')[1].text.strip() == "-":
Min = "N/A"
else:
Min = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Growing Degree Days":
if temp.find_all('td')[1].text.strip() == "-":
GrowingDegreeDays = "N/A"
else:
GrowingDegreeDays = temp.find_all('td')[1].text
elif temp.find_all('td')[0].text == "Heating Degree Days":
if temp.find_all('td')[1].text.strip() == "-":
HeatingDegreeDays = "N/A"
else:
HeatingDegreeDays = temp.find_all('td')[1].text
elif temp.find_all('td')[0].text == "Dew Point":
if temp.find_all('td')[1].text.strip() == "-" or temp.find_all('td')[1].text.strip() == "":
DewPoint = "N/A"
else:
DewPoint = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Precipitation" and temp.find_all('td')[1].text.strip() != "":
if temp.find_all('td')[1].text.strip() == "-" or temp.find_all('td')[1].text.strip() == "":
Precipitation = "N/A"
else:
Precipitation = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Sea Level Pressure" and temp.find_all('td')[1].text.strip() != "":
if temp.find_all('td')[1].text.strip() == "-":
SeaLevelPressure = "N/A"
else:
SeaLevelPressure = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Max Wind Speed":
if temp.find_all('td')[1].text.strip() == "-" or temp.find_all('td')[1].text.strip() == "":
MaxWindSpeed = "N/A"
else:
MaxWindSpeed = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
elif temp.find_all('td')[0].text == "Visibility":
if temp.find_all('td')[1].text.strip() == "-":
Visibility = "N/A"
else:
Visibility = temp.find_all('td')[1].find(attrs={"<td _ngcontent-c7" : "</td>"}).text
break

# combining the values to be written to the CSV file
CombinedString = theDate + "," + Mean + "," + Max + "," + Min + "," + HeatingDegreeDays + "," + DewPoint + "," + "," + Precipitation + "," + SeaLevelPressure + "," + MaxWindSpeed + "," + Visibility + "," + Events + "\n"
file.write(bytes(CombinedString, encoding="ascii", errors='ignore'))

# printing to help with any debugging and tracking progress
print(CombinedString)

file.close()

最佳答案

除非您使用selenium,否则无法使用 beautifulsoup 废弃数据。相反,我找到了几个 Json,其中包含您需要的数据(对此不确定,我不知道您想要哪些数据)

您可以在开发者控制台(f12)中找到所有json

enter image description here

我特别发现了这个(在图片上突出显示): https://api.weather.com/v1/geocode/42.36416626/-71.00499725/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&startDate=20160810&endDate=20160810&units=e

您可以通过更改 startDate 和 endDate 对其进行迭代。您还可以在“地理编码”之后更改地理定位

要获取 Json,您可以使用 urllib3 和库 json。

import urllib3
import json

http = urllib3.PoolManager()
r = http.request(
'GET',
url,
headers = {
'Accept': 'application/json'
})
json.loads(r.data.decode('utf-8'))

关于javascript - 使用 beautiful soup 4 抓取天气数据(网站是用 javascript 编码的),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51949091/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com