gpt4 book ai didi

python - Selenium webdriver 从 find_elements_by_X 返回空列表

转载 作者:行者123 更新时间:2023-12-02 03:01:30 27 4
gpt4 key购买 nike

我的目标是获取所有已在 https://www.prusaprinters.org/prints 上发布的新项目的名称列表在给定一天的 24 小时内。

通过一些阅读,我了解到我应该使用 Selenium,因为我抓取的网站是动态的(在用户滚动时加载更多对象)。

问题是,我似乎无法从 webdriver.find_elements_by_ 中得到一个空列表,其中任何后缀都列在 https://selenium-python.readthedocs.io/locating-elements.html 中。 .

在网站上,当我检查要获取标题的元素时,我看到 "class = name""class = clamp-two-lines" (见屏幕截图),但我似乎无法返回页面上所有元素的列表,其中包含该 name 类或 clamp-two-lines 类。

prusaprinters inspect element

这是我目前的代码(注释掉的行是失败的尝试):

from timeit import default_timer as timer
start_time = timer()
print("Script Started")

import bs4, selenium, smtplib, time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(r'D:\PortableApps\Python Peripherals\chromedriver.exe')

url = 'https://www.prusaprinters.org/prints'
driver.get(url)
# foo = driver.find_elements_by_name('name')
# foo = driver.find_elements_by_xpath('name')
# foo = driver.find_elements_by_class_name('name')
# foo = driver.find_elements_by_tag_name('name')
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[class*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=clamp-two-lines]')]
# foo = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="printListOuter"]//ul[@class="clamp-two-lines"]/li')))
print(foo)
driver.quit()

print("Time to run: " + str(round(timer() - start_time,4)) + "s")

我的研究:

  1. Selenium only returns an empty list
  2. Selenium find_elements_by_css_selector returns an empty list
  3. Web Scraping Python (BeautifulSoup,Requests)
  4. Get HTML Source of WebElement in Selenium WebDriver using Python
  5. How to get Inspect Element code in Selenium WebDriver
  6. Web Scraping Python (BeautifulSoup,Requests)
  7. https://chrisalbon.com/python/web_scraping/monitor_a_website/
  8. https://www.codementor.io/@gergelykovcs/how-and-why-i-built-a-simple-web-scrapig-script-to-notify-us-about-our-favourite-food-fcrhuhn45
  9. https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_dynamic_websites.htm

最佳答案

要获取文本,请等待元素的可见性。标题的 CSS 选择器是 #printListOuter h3:

titles = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))

for title in titles:
print(title.text)

较短的版本:

wait = WebDriverWait(driver, 10)
titles = [title.text for title in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))]

关于python - Selenium webdriver 从 find_elements_by_X 返回空列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59868524/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com