gpt4 book ai didi

python Selenium 网络抓取

转载 作者:行者123 更新时间:2023-11-30 22:04:02 25 4
gpt4 key购买 nike

您好,我正在尝试使用 pyhton 和 selenium 抓取网页。我试图从页面获取的信息是比赛信息/记分板。例如当前盘、球员姓名、每个球员的得分。我不断收到 TimeoutException。有人可以告诉我如何检索此信息,下面是页面示例的链接。

https://www.bovada.lv/sports/tennis/itf-men/chile-singles/a-tabilo-i-monzon-201811211325

下面是我的代码

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 50)
small_wait = WebDriverWait(driver, 50)


driver.execute_script('window.open("https://www.bovada.lv/sports/tennis/itf-men/chile-singles/a-tabilo-i-monzon-201811211325","_self")')

#//*[@id="tracker__header"]
dat = []
try:
dat.append([wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="tracker__header"]/div/div[1]/div/div[2]'))).text])
except TimeoutException:
print('error')

driver.quit()

下面是我想从网站获取的信息 enter image description here

最佳答案

您需要切换到 iframe 才能获取值(value):

driver.switch_to.frame(driver.find_element_by_css_selector('iframe[id^="iframe-tracker-"]'))
try:
dat.append(wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="tracker__header"]/div/div[1]/div/div[2]'))).text)
except TimeoutException:
print('error')

关于 python Selenium 网络抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53418867/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com