gpt4 book ai didi

Python 从 tripadvisor 抓取 'things to do'

转载 作者:太空宇宙 更新时间:2023-11-04 07:14:24 26 4
gpt4 key购买 nike

来自 this页面,我想抓取列表“迈阿密的事件类型”(您可以在页面末尾附近找到它)。这是我到目前为止所拥有的:

import requests
from bs4 import BeautifulSoup

# Define header to prevent errors
user_agent = "Mozilla/44.0.2 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/9.0.2"

headers = {'User-Agent': user_agent}

new_url = "https://www.tripadvisor.com/Attractions-g34438-Activities-Miami_Florida.html"
# Get response from url
response = requests.get(new_url, headers = headers)
# Encode response for parsing
html = response.text.encode('utf-8')
# Soupify response
soup = BeautifulSoup(html, "lxml")

tag_elements = soup.findAll("a", {"class":"attractions-attraction-overview-main-Pill__pill--23S2Q"})

# Iterate over tag_elements and exctract strings
tags_list = []
for i in tag_elements:
tags_list.append(i.string)

问题是,我从“Commonly Searched For in Miami' area of​​ the page which below the "Types of Things..."部分页面。我也没有得到一些我需要的值,比如 “Traveler Resources (7)”、“Day Trips (7)” 等。这两个列表的类名“Things to do。 ..”和“常用搜索...”是相同的,我在 soup.findAll() 中使用类,我猜这可能是导致此问题的原因。这样做的正确方法是什么?我应该采取其他方法吗?

最佳答案

这在浏览器中非常简单:

filters = driver.execute_script("return [...document.querySelectorAll('.filterName a')].map(a => a.innerText)")

关于Python 从 tripadvisor 抓取 'things to do',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53452863/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com