gpt4 book ai didi

python - 实时网页抓取

转载 作者:太空宇宙 更新时间:2023-11-04 04:25:56 25 4
gpt4 key购买 nike

我目前正在使用 BeautifulSoup 进行网络抓取,它是在 xml 中获取和编写的,如下面的代码所示,我只是想知道我该怎么做才能使其实时,因为网站每 5 分钟更新一次。

import csv
import requests
from bs4 import BeautifulSoup

url = 'http://www.dublincity.ie/dublintraffic/cpdata.xml?1543254514266'

res = requests.get(url)
soup = BeautifulSoup(res.content,"xml")
data = []
for item in soup.select("carpark"):
ditem = {}
ditem['Name'] = item.get("name")
ditem['Spaces'] = item.get("spaces")
data.append(ditem)

with open("xmldocs.csv","w",newline="") as f:
writer = csv.DictWriter(f,["Name","Spaces"])
writer.writeheader()
for info in data:
writer.writerow(info)

最佳答案

您可以使用 while循环,然后在最后你可以添加一个 sleep 5 分钟。

使用您的示例,这将是:

import csv
import requests
from bs4 import BeautifulSoup
import time

while True:
url = 'http://www.dublincity.ie/dublintraffic/cpdata.xml?1543254514266'

res = requests.get(url)
soup = BeautifulSoup(res.content,"xml")
data = []
for item in soup.select("carpark"):
ditem = {}
ditem['Name'] = item.get("name")
ditem['Spaces'] = item.get("spaces")
data.append(ditem)

with open("xmldocs.csv","w",newline="") as f:
writer = csv.DictWriter(f,["Name","Spaces"])
writer.writeheader()
for info in data:
writer.writerow(info)

time.sleep(5 * 60)

关于python - 实时网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53498715/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com