gpt4 book ai didi

python - 网络爬虫返回多个错误

转载 作者:行者123 更新时间:2023-12-01 00:40:13 25 4
gpt4 key购买 nike

我正在做一个保险网页的网络抓取工具,它以 CSV 形式检索我的型号、品牌、子品牌和描述,当我运行我的代码时,它有时会起作用,有时会出现多个错误(“列出索引”)必须是整数”,“期望值:第 1 行第 1 列”,“JSON 解码器不起作用”)

我尝试插入打印件并尝试查看问题出在哪里,但仍然无法解决。

import requests
import time
import json


session = requests.Session()
request_marcas = session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/brands-subbrands')
data = request_marcas.json()
fileCSV = open("webscraper_test.csv", "a")
fileCSV.write('Modelo' + ';' + 'ID_Marca' + ";" + 'ID_Submarca' + ";" + "ID_Tipo" + ";" + "Marca" +";"+ "Tipo"+ 'Descripcion' + "\n")

for i in range(2019, 2020):
for marca in data['MARCA']:
for submarca in marca['SUBMARCAS']:
modelos = []
modelos.append('https://www.citibanamexchubb.com/api/chubbnet/auto/models/' + marca['ID'] + '/' + submarca['ID'] + '/' + str(i))
for link in modelos:
json_link = []
request_link = session.get(link).json()
json_link.append(request_link)
#print(request_link)
for desc_id in request_link['TIPO']:
#print(desc_id['ID'])
desc_detail = []
desc_detail.append(session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + desc_id['ID'] + '/2018').json())
#print(desc_detail)
try:
for desc in desc_detail['DESCRIPCION']:
print(desc['DESC'])
except Exception as e:
None

最佳答案

因此,您正在抓取的 auto/models 端点存在一些奇怪的差异。例如,https://www.citibanamexchubb.com/api/chubbnet/auto/models/7/8/2019返回此:

{
"TIPO": {
"ID": "381390223",
"DESC": "MINI COOPER"
}
}

同时https://www.citibanamexchubb.com/api/chubbnet/auto/models/1/1/2019返回此:

{
"TIPO": [
{
"ID": "364026215",
"DESC": "MDX"
},
{
"ID": "364026216",
"DESC": "RDX"
},
{
"ID": "364031544",
"DESC": "ILX"
},
{
"ID": "364031613",
"DESC": "TLX"
},
{
"ID": "364031674",
"DESC": "NSX"
}
]
}

因此,在第一个中,“TIPO”是一个字典,而在第二个中,“TIPO”是一个列表。我对您的脚本进行了修改,使其运行而不会引发任何错误。我确信这不完全是您想要的,但它至少处理了两种类型之间的差异:

import requests
import time
import json


session = requests.Session()
request_marcas = session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/brands-subbrands')
data = request_marcas.json()
fileCSV = open("webscraper_test.csv", "a")
fileCSV.write('Modelo' + ';' + 'ID_Marca' + ";" + 'ID_Submarca' + ";" + "ID_Tipo" + ";" + "Marca" +";"+ "Tipo"+ 'Descripcion' + "\n")

for i in range(2019, 2020):
for marca in data['MARCA']:
for submarca in marca['SUBMARCAS']:
modelos = []
modelos.append('https://www.citibanamexchubb.com/api/chubbnet/auto/models/' + marca['ID'] + '/' + submarca['ID'] + '/' + str(i))
for link in modelos:
json_link = []
request_link = session.get(link).json()
json_link.append(request_link)
#print(request_link)

# here's where I've made some changes:
desc_detail = []
if isinstance(request_link['TIPO'], dict):
desc_detail.append(session.get(
'https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + request_link['TIPO'][
'ID'] + '/2018').json())
print(request_link['TIPO']['DESC'])
elif isinstance(request_link['TIPO'], list):
for item in request_link['TIPO']:
desc_detail.append(session.get('https://www.citibanamexchubb.com/api/chubbnet/auto/descriptions/' + item['ID'] + '/2018').json())
print(item['DESC'])

希望有帮助!

关于python - 网络爬虫返回多个错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57415876/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com