gpt4 book ai didi

python - 抓取选择某些内容时加载的数据

转载 作者:太空宇宙 更新时间:2023-11-03 16:29:08 25 4
gpt4 key购买 nike

我需要抓取汽车品牌和型号。现在我可以抓取汽车品牌选择选项列表,但无法抓取汽车型号选择选项列表,因为它是在您选择汽车品牌时加载的。

也许您有任何想法如何让我在选择汽车品牌时加载汽车型号选择选项列表。

这是我的代码。

# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup



url="http://autoplius.lt/redaguoti/naudoti-automobiliai/"
page = requests.get(url)

soup = BeautifulSoup(page.content, "html.parser")

car_makes_select = soup.find("select", {"id": "make_id"})

car_makes = car_makes_select.select("option")


for item in car_makes:
itemMain = item

itemMain = itemMain.get('value')

payload = {
'make_id': itemMain
}

form = requests.post(url, params=payload)

soup11 = BeautifulSoup(form.text, "html.parser")

model_select = soup11.find("select", {"id": "model_id"})

print model_select

最佳答案

您需要发布数据:

enter image description here

parent_idmake_id内每个选项的值select:

import requests
from bs4 import BeautifulSoup

url = "http://autoplius.lt/redaguoti/naudoti-automobiliai/"
page = requests.get(url)

# form data fields can be hard coded bar parent_id
data = {"target_id": "model_id",
"project": "autoplius",
"category_id": "2",
"type": "edit",
"my_anns": "false",
"__block": "ann_ajax_0_plius",
"__opcode": "ajaxGetChildsTo"}

soup = BeautifulSoup(page.content, "html.parser")


headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"}

# first two options hold no ids
car_makes = soup.select("#make_id option + option + option")

for car in car_makes:
print(car.text)
# pass id/value
data["parent_id"] = car["value"]
# data for a post not params
form = requests.post(url, data=data, headers=headers)
soup11 = BeautifulSoup(form.text, "html.parser")
print(soup11)

该帖子返回的数据格式如下:

"<option value=\"\">- Pasirinkite -<\/option><option value=\"1088\">-kita-<\/option><option value=\"16230\">208<\/option><option value=\"16231\">246<\/option><option value=\"16232\">250<\/option><option value=\"16233\">275<\/option><option value=\"16234\">288<\/option><option value=\"16235\">308<\/option><option value=\"16236\">328<\/option><option value=\"16237\">330<\/option><option value=\"1093\">348<\/option><option value=\"1089\">360<\/option><option value=\"16238\">365<\/option><option value=\"16239\">400<\/option><option value=\"16240\">412<\/option><option value=\"1086\">456<\/option><option value=\"16229\">458<\/option><option value=\"16245\">512<\/option><option value=\"1085\">550<\/option><option value=\"1084\">575<\/option><option value=\"1092\">599 GTB Fiorano<\/option><option value=\"16246\">612<\/option><option value=\"1090\">612 Scaglietti<\/option><option value=\"16247\">750<\/option><option value=\"1079\">Barchetta<\/option><option value=\"16250\">California<\/option><option value=\"16248\">Daytona<\/option><option value=\"1078\">Enzo<\/option><option value=\"1083\">F 355<\/option><option value=\"18638\">F 360<\/option><option value=\"1087\">F 40<\/option><option value=\"1091\">F 430<\/option><option value=\"1081\">F 50<\/option><option value=\"1080\">F 512<\/option><option value=\"1077\">Maranello<\/option><option value=\"1076\">Mondial<\/option><option value=\"16249\">Superamerica<\/option><option value=\"1075\">Testarossa<\/option>"

我们需要稍微整理一下并仅提取模型:

car_makes = soup.select("#make_id option + option + option")

for car in car_makes:
print(car.text)
data["parent_id"] = car["value"]
form = requests.post(url, data=data, headers=headers)
soup11 = BeautifulSoup(form.content.strip('"').replace('\\"','"').replace("\/", "/"), "html.parser")
print([opt.text for opt in soup11.select("option + option + option") if not opt.text.isdigit()])

如果我们运行代码,我们会得到如下输出:

In [10]: for car in car_makes:
....: print(car.text)
....: data["parent_id"] = car["value"]
....: form = requests.post(url, data=data, headers=headers)
....: soup11 = BeautifulSoup(form.content.strip('"').replace('\\"','"').replace("\/", "/"), "html.parser")
....: print([opt.text for opt in soup11.select("option + option + option") if not opt.text.isdigit()])
....:
AC
[u'Ace', u'Aceca', u'Cobra']
Acura
[u'CL', u'EL', u'ILX', u'Integra', u'MDX', u'NSX', u'RDX', u'RL', u'RSX', u'SLX', u'TL', u'TLX', u'TSX', u'Vigor', u'ZDX']
Aixam
[u'A751', u'City', u'Crossline', u'Ligier', u'Scouty']
Alfa Romeo
[u'4C', u'8C', u'Alfasud', u'Alfetta', u'Arna', u'Brera', u'Crosswagon Q4', u'Giulia', u'Giulietta', u'GT', u'GTV', u'Junior', u'Mito', u'RZ/SZ', u'Spider', u'Sportwagon', u'Sprint']
Alpina
[u'B12', u'B3', u'B5', u'B6', u'B7', u'B8', u'D10', u'D3', u'Roadster S']
AMC
[u'Ambassador', u'Concord', u'Eagle', u'Gremlin', u'Javelin', u'Matador', u'Pacer', u'Rambler', u'Rebel', u'Spirit']
ARO
[u'K450', u'Spartana']
Asia
[u'Hi-Topic', u'Retona', u'Rocsta', u'Towner']

关于python - 抓取选择某些内容时加载的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37765026/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com