gpt4 book ai didi

python - 无法以所需方式从网页中获取两个字段

转载 作者:太空宇宙 更新时间:2023-11-04 08:25:39 25 4
gpt4 key购买 nike

我在 python 中创建了一个脚本,用于从位于网页中的表中获取两个字段(第 2 列和第 3 列)timecurrency。脚本正在获取结果,但不是我希望的方式。

Website address

到目前为止我已经写了:

import requests
from bs4 import BeautifulSoup

URL = "https://www.forexfactory.com/calendar.php?week=this"

res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")

for item in soup.select("tr.calendar_row"):
ftime = item.select_one("td.calendar__time").get_text(strip=True)
currency = item.select_one("td.calendar__currency").get_text(strip=True)
print(ftime,currency)

我得到的结果:

All Day JPY
5:00am CNY
CNY
2:00pm USD
1:59am JPY
2:00am EUR
EUR
4:30am GBP
GBP
GBP

预期结果:

All Day JPY
3:00pm CNY
3:00pm CNY
2:00pm USD
1:59am JPY
12:00pm EUR
12:00pm EUR
2:30pm GBP
2:30pm GBP
2:30pm GBP

The times I'm getting are different from that site. Moreover, I wish to fill in the blank times with its earlier values.

如何修改现有脚本以获取上面显示的结果?

最佳答案

问题在于时间列中的空单元格

import requests
from bs4 import BeautifulSoup

URL = "https://www.forexfactory.com/calendar.php?week=this"

# Make cookie dictionary for setting timezones
cookies={
"fftimezoneoffset":"0", #timezone / UTC +/-X
"fftimeformat":"1", # format 0=am/pm / 1=24hour format
"ffdstonoff":"1", # daylight saving
"ffverifytimes":"1" # set times to timezone
}
res = requests.get(URL,cookies=cookies) # apply timezones
soup = BeautifulSoup(res.text,"lxml")
lastTime = "" #lastTime for cases with empty values for times
for item in soup.select("tr.calendar_row"):

ftime = item.select_one("td.calendar__time").get_text(strip=True)
if len(ftime) == 0: #if empty time use last one
ftime = lastTime
lastTime = ftime
currency = item.select_one("td.calendar__currency").get_text(strip=True)
if len(currency) > 0: # print if there is currenty
print(ftime,currency)

关于python - 无法以所需方式从网页中获取两个字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57479107/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com