gpt4 book ai didi

python - 在页面上使用 selenium 查找电子邮件地址

转载 作者:太空宇宙 更新时间:2023-11-04 05:14:10 24 4
gpt4 key购买 nike

我正在尝试从网站获取电子邮件地址列表并且非常接近。我的代码可以在下面看到。我收到以下错误。

发生的事情是点击一个链接页面,然后在下一页中有一个电子邮件地址。

我试图在点击链接后打印出每个页面内的电子邮件地址。

Here is an example of a page that the link clicks through to .

xTraceback (most recent call last): File "scrape.py", line 34, in lookup(driver) File "scrape.py", line 26, in lookup emailAdress = driver.find_element_by_xpath('//div[@id="widget-contact"]//a‌​').get_attribute('hr‌​ef') File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 293, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element 'value': value})['value'] File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSelectorException:

我正在使用 python 2.7.13

# -*- coding: utf-8 -*-

from lxml import html
import requests
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

def init_driver():
driver = webdriver.Firefox()
driver.wait = WebDriverWait(driver, 5)
return driver


def lookup(driver):
driver.get("http://www.sportbirmingham.org/directory?sport=&radius=15&postcode=B16+8QG&submit=Search")
try:
for link in driver.find_elements_by_xpath('//h2[@class="heading"]/a'):
link.click()
emailAdress = driver.find_element_by_xpath('//div[@id="widget-contact"]//a‌​').get_attribute('hr‌​ef')
print emailAdress
except TimeoutException:
print "not found"


if __name__ == "__main__":
driver = init_driver()
lookup(driver)
time.sleep(5)
driver.quit()

当我尝试继续访问下一页链接时,出现以下错误

File "scrape.py", line 43, in lookup(driver) File "scrape.py", line 26, in lookup links.extend([link.get_attribute('href') for link in driver.find_elements_by_xpath('//h2[@class="heading"]/a')]) File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 139, in get_attribute self, name) File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 465, in execute_script 'args': converted_args})['value'] File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message: The element reference is stale. Either the element is no longer attached to the DOM or the page has been refreshed.

最佳答案

您只需要更精确的 X-PATH(以及调用 text 方法):

emailAdress = driver.find_element_by_xpath('//div[@class="body"]/dl/dd[2]').text

但是这个例子适用于 Python3。请让我知道这对你有没有用。我还建议为 Chrome 使用“XPath Helper”扩展。

关于python - 在页面上使用 selenium 查找电子邮件地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42186921/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com