gpt4 book ai didi

python - 尝试抓取任何地方或餐厅的谷歌首页地址,但不幸的是

转载 作者:太空宇宙 更新时间:2023-11-03 21:00:14 26 4
gpt4 key购买 nike

尝试从 Google 首页信息面板抓取餐馆地址,但收到“urllib.error.HTTPError: HTTP Error 403: Forbidden”错误且程序未运行。我对 python 网络抓取比较陌生,请帮忙。

    import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import json
import re
import sys
import warnings

if not sys.warnoptions:
warnings.simplefilter("ignore")

#get google URL.
url = "https://www.google.com/search?q=barbeque%20nation%20-%20noida"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)

page = fromstring(response)

soup = BeautifulSoup(page, 'url.parser')

the_page = soup.prettify("utf-8")
hotel_json = {}

for line in soup.find_all('script',attrs={"type" :
"application/ld+json"}):
details = line.text.strip()
details = json.loads(details)

hotel_json["address"]["LrzXr"]=details["address"]["streetAddress"]

break
with open(hotel_json["name"]+".html", "wb") as file:
file.write(html)

with open(hotel_json["name"]+".json", 'w') as outfile:
json.dump(hotel_json, outfile, indent=4)

最佳答案

添加用户代理 header

request = urllib.request.Request(url, headers = {'User-Agent' : 'Mozilla/5.0'})

关于python - 尝试抓取任何地方或餐厅的谷歌首页地址,但不幸的是,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55756817/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com