gpt4 book ai didi

python - 从 Vivino.com 抓取数据

转载 作者:行者123 更新时间:2023-12-04 14:13:51 25 4
gpt4 key购买 nike

在这里潜伏了很长时间,这个社区一直在帮助我,谢谢大家。

所以我试图从 vivino.com 收集数据,但 DataFrame 是空的,我可以看到我的汤正在收集网站信息,但看不到我的错误在哪里。

我的代码:


headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}

r = requests.get("https://www.vivino.com/explore?e=eJzLLbI1VMvNzLM1UMtNrLA1NTBQS660DQhRS7Z1DQ1SKwDKpqfZliUWZaaWJOao5SfZFhRlJqeq5dsmFierlZdExwJVJFcWA-mCEgC1YxlZ", headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content, "html.parser")

因为我需要酿酒师、葡萄酒名称和评级,所以我是这样尝试的:
    for d in soup.findAll('div', attrs={'class':'explorerCard__titleColumn--28kWX'}):

Winery = d.find_all("a", attrs={"class":"VintageTitle_winery--2YoIr"})
Wine = d.find_all("a", attrs={"class":"VintageTitle_wine--U7t9G"})
Rating = d.find_all("div", attrs={"class":"VivinoRatingWide_averageValue--1zL_5"})
num_Reviews = d.find_all("div", attrs={"class":"VivinoRatingWide__basedOn--s6y0t"})
Stars = d.find_all("div", attrs={"aria-label":"rating__rating--ZZb_x rating__vivino--1vGCy"})

alll=[]

if Winery is not None:
#print(n[0]["alt"])
alll.append(Winery.text)

else:
alll.append("unknown-winery")

if Wine is not None:
#print(wine.text)
alll.append(wine.text)
else:
alll.append("0")

if Rating is not None:
#print(rating.text)
alll.append(rating.text)

else:
alll.append("0")
...

然后将数据放入 DataFrame 中:
for i in range(1, no_pages+1):
results.append(get_data())
flatten = lambda l: [item for sublist in l for item in sublist]
df = pd.DataFrame(flatten(results),columns=['Winery','Wine','Rating','num_review', 'Stars'])
df.to_csv('redwines.csv', index=False, encoding='utf-8')

谢谢你们

最佳答案

上一个答案是正确的,但它需要用户代理 header 集:

import requests
import pandas as pd

r = requests.get(
"https://www.vivino.com/api/explore/explore",
params = {
"country_code": "FR",
"country_codes[]":"pt",
"currency_code":"EUR",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": 1,
"price_range_max":"500",
"price_range_min":"0",
"wine_type_ids[]":"1"
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
}
)
results = [
(
t["vintage"]["wine"]["winery"]["name"],
f'{t["vintage"]["wine"]["name"]} {t["vintage"]["year"]}',
t["vintage"]["statistics"]["ratings_average"],
t["vintage"]["statistics"]["ratings_count"]
)
for t in r.json()["explore_vintage"]["matches"]
]
dataframe = pd.DataFrame(results,columns=['Winery','Wine','Rating','num_review'])

print(dataframe)

您将需要增加 page迭代下一个结果的字段

关于python - 从 Vivino.com 抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62216146/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com