gpt4 book ai didi

python - 从动态页面检索所有汽车链接

转载 作者:太空宇宙 更新时间:2023-11-03 11:38:12 24 4
gpt4 key购买 nike

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'")
#options.add_argument("headless")
driver=webdriver.Chrome(executable_path="/home/timmy/Python/chromedriver",chrome_options=options)

url="https://turo.com/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00"
driver.get(url)


list_of_all_car_links=[]
x=0
while True:
html=driver.page_source
soup = BeautifulSoup(html, "html.parser")
for i in soup.find_all("a", href=True):
if i['href'].startswith("/rentals") and len(i['href']) > 31 :
link2="https://turo.com"+i['href']
list_of_all_car_links.append(link2)
try:
x=scrolldown(last_height=x)
except KeyError:
#driver.close()
break

我试着向下滚动然后找到链接,但我只得到了一部分是我的向下滚动功能:

def scrolldown(last_height=0,SCROLL_PAUSE_TIME=3,num_tries = 2):

# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")

# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)

new_height = driver.execute_script("return document.body.scrollHeight")

# break condition
if last_height == new_height:
#print("hello")
num_tries-=1
if num_tries==0:
print("Reached End of page")
raise KeyError
else:
scrolldown(last_height=last_height, SCROLL_PAUSE_TIME=2,num_tries=num_tries)

return new_height

我还尝试在每次滚动到 BeautifulSoup 后转换 html,然后找到链接但没有获得所有链接。

我想要的是获取该页面中的每个汽车链接。

最佳答案

我会使用 requests 和开发工具中 xhr 列表中显示的 API。请注意查询字符串 itemsPerPage=200 中的每页项目数参数。您可以尝试更改它以获得更大的结果集。

import requests
url = 'https://turo.com/api/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00'
baseUrl = 'https://turo.com'
headers = {'Referer' : 'https://turo.com/search?country=US&defaultZoomLevel=7&endDate=03%2F20%2F2019&endTime=10%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=Colorado%2C%20USA&locationType=City&maximumDistanceInMiles=30&northEastLatitude=41.0034439&northEastLongitude=-102.040878&region=CO&sortType=RELEVANCE&southWestLatitude=36.992424&southWestLongitude=-109.060256&startDate=03%2F15%2F2019&startTime=10%3A00',
'User-Agent' : 'Mozilla/5.0'}

r = requests.get(url, headers = headers).json()
results = []

for item in r['list']:
results.append(baseUrl + item['vehicle']['url'])
print(results)

关于python - 从动态页面检索所有汽车链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55156292/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com