gpt4 book ai didi

python - 从 Vivino 抓取 "Compare Vintages"

转载 作者:行者123 更新时间:2023-12-04 14:02:23 25 4
gpt4 key购买 nike

我正在尝试从 Vivino 抓取数据,到目前为止,我设法使用 API 并使用这篇文章从 json 文件中读取: https://stackoverflow.com/a/62224619/7575172

r = requests.get(
"https://www.vivino.com/api/explore/explore",
params = {
"country_code": "DK",
"country_codes[]":"fr",
"currency_code":"DKK",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": 1,
"price_range_max":"500",
"price_range_min":"0",
"wine_type_ids[]":"1",

},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
}
)
results = [
(
t["vintage"]["wine"]["winery"]["name"],
f'{t["vintage"]["wine"]["name"]}',
t['vintage']['year'],
t['vintage']['wine']['region']['country']['name'],
# t['vintage']['wine']['region']['name_en'],
t['vintage']['wine']['style']['region']['name'],
t["vintage"]["statistics"]["ratings_average"],
t["vintage"]["statistics"]["ratings_count"],
t['price']['amount']
)
for t in r.json()["explore_vintage"]["matches"]
]
dataframe = pd.DataFrame(results,columns=
['Winery',
'Wine',
'Year',
'Country',
'Region',
'Rating',
'num_review',
'Price'])

# print(dataframe)
print(dataframe[['Winery','Year','Region','Rating','num_review','Price']])

但是,我无法在任何 json 文件中找到描述同一 Wine 的其他可用年份的数据。例如。我看的是 2019 年,但也有 2015-2020 年的数据。

我已经使用 Firefox 中的网络监视器来检查打开以下页面时发送的其他 json 文件。但据我所知,关于总可用年份的信息不存在?

可以在此处和图片中看到我想要抓取的部分示例: https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846?ref=nav-search#vintageListSection

enter image description here

最佳答案

数据位于 window.__PRELOADED_STATE__.winePageInformation 对象下的 javascript 中,如下所示:

<script>
window.__PRELOADED_STATE__ = ....
window.__PRELOADED_STATE__.winePageInformation = { very long JSON here }
</script>

您可以使用正则表达式来提取它,结果似乎是有效的 JSON:

import requests
import re
import json

url = "https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846"
r = requests.get(url,
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
})
# this gets the javascript object
res = re.search(r"^.*window\.__PRELOADED_STATE__\.winePageInformation\s*=\s*(.*});", r.text, re.MULTILINE)
print( r.text)
data = json.loads(res.group(1))

print("recommended vintages")
print(data["recommended_vintages"])

print("all vintages")
print(data["wine"]["vintages"])

关于python - 从 Vivino 抓取 "Compare Vintages",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69577385/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com