gpt4 book ai didi

python - 鼠标悬停弹出时使用selenium和python提取数据

转载 作者:行者123 更新时间:2023-12-05 06:15:31 24 4
gpt4 key购买 nike

大家好,这是我的第一个问题。我正在尝试从网站中提取数据。但问题是,它只有在我将鼠标悬停在它上面时才会出现。数据的网站是http://insideairbnb.com/melbourne/ .当我将鼠标指针悬停在 map 上的点上时,我想从弹出的面板中提取每个列表的入住率。我正在尝试使用此 stackoverflow 帖子中的@frianH 代码 Scrape website with dynamic mouseover event .我是使用 Selenium 进行数据提取的新手。我了解 bs4 包。我没有成功找到正确的 xpath 来完成任务。先感谢您。到目前为止我的代码是

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='C:\\Users\\Kunal\\chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()

#wait all circle
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="map"]/div[1]/div[2]/div[2]/svg')))
table = browser.find_element_by_class_name('leaflet-zoom-animated')

#move perform -> to table
browser.execute_script("arguments[0].scrollIntoView(true);", table)

data = []
for circle in elements:
#move perform -> to each circle
ActionChains(browser).move_to_element(circle).perform()
# wait change mouseover effect
mouseover = WebDriverWait(browser, 30).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="neighbourhoodBoundaries"]')))
data.append(mouseover.text)

print(data[0])

提前致谢

最佳答案

所以我检查了一堆页面,它似乎对 selenium 自己的方法很有抵抗力,所以我们不得不依赖 javascript。这是完整的代码-

from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()

# Set up a 30 seconds webdriver wait
explicit_wait30 = WebDriverWait(browser, 30)

try:
# Wait for all circles to load
circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
except TimeoutException:
browser.refresh()

data = []
for circle in circles:
# Execute mouseover on the element
browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
# Wait for the data to appear
listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
# listing now contains the full element list - you can parse this yourself and add the necessary data to `data`
.......
# Close the listing
browser.execute_script("arguments[0].click()", listing.find_element_by_tag_name('button'))

我还使用了 css 选择器而不是 XPATH。以下是流程的工作原理-

circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))

这会等到所有圆圈都出现并将它们提取到 circles 中。

请记住,页面加载圆圈的速度非常慢,因此我设置了一个 try/except block ,如果页面在 30 秒内未加载则自动刷新页面.随心所欲地更改它

现在我们必须遍历所有的圆圈-

for circle in circles:

接下来是在圆圈上模拟一个 mouseover 事件,我们将使用 javascript 来完成这个

这就是 javascript 的样子(注意 circle 指的是我们将从 selenium 传递的元素)

const mouseoverEvent = new Event('mouseover');
circle.dispatchEvent(mouseoverEvent)

脚本是这样通过selenium-执行的

browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)

现在我们必须等待列表出现-

listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))

现在,您已经listing,这是一个还包含许多其他元素的元素,您现在可以很容易地提取每个元素并将它们存储在 data 中.

如果您不关心以不同方式提取每个元素,只需在 listing 上执行 .text 就会产生类似这样的结果-

'Tanya\n(No other listings)\n23127829\nSerene room for a single person or a couple.\nGreater Dandenong\nPrivate room\n$37 income/month (est.)\n$46 /night\n4 night minimum\n10 nights/year (est.)\n2.7% occupancy rate (est.)\n0.1 reviews/month\n1 reviews\nlast: 20/02/2018\nLOW availability\n0 days/year (0%)\nclick listing on map to "pin" details'

就是这样,然后你可以将结果追加到 data 中,你就完成了!

关于python - 鼠标悬停弹出时使用selenium和python提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62469332/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com