python - 从数据框中读取网络链接会引发 "stale element reference: element is not attached to the page document"错误-6ren

python - 从数据框中读取网络链接会引发 "stale element reference: element is not attached to the page document"错误

转载作者：行者123 更新时间：2023-12-04 17:09:31

我得到了一个数据框，其中包含指向两家餐厅的谷歌评论的链接。我想将两家餐厅的所有评论(一个接一个)加载到浏览器中，然后将它们保存到一个新的数据框中。我编写了一个脚本来读取所有评论并将其加载到浏览器中，如下所示:

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
import time

link_df =   Link
0   https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318a3aa3041455:0x5f83f4fae76d8656,1,,,&rlfi=hd:;si:6882614014013965910,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEiglZKhm6qAgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSARJidXJtZXNlX3Jlc3RhdXJhbnSqAQwQASoIIgRmb29kKAA,y,UB2auy7TMYs;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]
1   https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318bf82139caaf:0xf115cd7fe794cbcc,1,,,&rlfi=hd:;si:17372017086881385420,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEjh9auu-q6AgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSAQpyZXN0YXVyYW50qgEMEAEqCCIEZm9vZCgA,y,ZeJbBWd7wDg;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]

i = 0
driver = webdriver.Chrome()
for index, i in link_df.iterrows():
    base_url = i['Link']   #link_df['Link'][i]
    
    driver.get(base_url)
    WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[./span[text()='Newest']]"))).click()
    print('Restaurant number is ',index)
    
    title = driver.find_element_by_xpath("//div[@class='P5Bobd']").text
    address = driver.find_element_by_xpath("//div[@class='T6pBCe']").text
    overall_rating = driver.find_element_by_xpath("//div[@class='review-score-container']//span[@class='Aq14fc']").text
    
    total_reviews_text =driver.find_element_by_xpath("//div[@class='review-score-container']//div//div//span//span[@class='z5jxId']").text
    num_reviews = int (total_reviews_text.split()[0])
    all_reviews = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
    time.sleep(2)
    total_reviews = len(all_reviews)
    
    while total_reviews < num_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        time.sleep(5)
        all_reviews = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))
        print(total_reviews)
        total_reviews +=1
    reviews_info = driver.find_elements_by_xpath("//div[@class='jxjCjc']")
    review_information = pd.DataFrame(columns=["Restaurant title","Restaurant rating","Total reviews","Reviewer Name","Rating", "Review"])
    name= ''
    rating = ''
    text = ''
    
    
    for index,review_info in enumerate(reviews_info):
        name = review_info.find_element_by_xpath("./div/div/a").text
        rating = review_info.find_element_by_xpath(".//div[@class='PuaHbe']//g-review-stars//span").get_attribute('aria-label')
        text = review_info.find_element_by_xpath(".//div[@class='Jtu6Td']//span").text
        review_information.at[len(review_information)] = [title,overall_rating,num_reviews,name,rating,text]
    
    filename = 'Google_reviews' + ' ' +pd.to_datetime("now").strftime("%Y_%m_%d")+'.csv'
    files_present = glob.glob(filename)
    if files_present:
        review_information.to_csv(filename,index=False,mode='a',header=False)
    else:
        review_information.to_csv(filename,index=False)
    
    driver.get('https:ww.google.com')
    time.sleep(3)

问题是脚本在到达下一行时抛出错误。

driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])

它抛出以下错误:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=95.0.4638.69)

当我尝试相同的程序但没有在数据框中存储谷歌链接时(即没有 for 循环而不是 base_url = i['Link']，我写了 base_url =谷歌评论链接)它工作正常。

我不确定我在哪里犯了错误。任何解决问题的建议或帮助将不胜感激？

最佳答案

编辑

将驱动程序的创建放在 for 循环之外
当第一个弹出窗口总是在前面时，你不能启动带有 gps 数据的新 url，如果你启动它，它会留在后门，更简单的方法是启动一个没有 gps 数据的新 url -> https:ww.google .com 并在 12 月 3 日之前等待您的循环:
你的计数不好，我已经更改了你的选择器并更改了总数并在评论中设置了一些行

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.firefox.options import Options
import time

link_df =  ["https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318a3aa3041455:0x5f83f4fae76d8656,1,,,&rlfi=hd:;si:6882614014013965910,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEiglZKhm6qAgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSARJidXJtZXNlX3Jlc3RhdXJhbnSqAQwQASoIIgRmb29kKAA,y,UB2auy7TMYs;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]",
            "https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318bf82139caaf:0xf115cd7fe794cbcc,1,,,&rlfi=hd:;si:17372017086881385420,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEjh9auu-q6AgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSAQpyZXN0YXVyYW50qgEMEAEqCCIEZm9vZCgA,y,ZeJbBWd7wDg;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]"
           ]
i = 0
binary = r'C:\Program Files (x86)\Mozilla Firefox\firefox.exe'
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
options = Options()
options.binary = binary
driver = webdriver.Firefox(options=options, capabilities=cap, executable_path="E:\\Téléchargement\\geckodriver.exe")

# i have to launch one time to accept the cookies manually 
#by setting a breakpoint after, but you dont have that i think
#driver.get(link_df[0])  

print ("Headless Firefox Initialized")


print(link_df)
for url in link_df:
    base_url = url    # i['Link']  # link_df['Link'][i]
    print(base_url)
    driver.get(base_url)
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//div[./span[text()='Avis les plus récents']]"))).click()

    title = driver.find_element_by_xpath("//div[@class='P5Bobd']").text
    address = driver.find_element_by_xpath("//div[@class='T6pBCe']").text
    overall_rating = driver.find_element_by_xpath("//div[@class='review-score-container']//span[@class='Aq14fc']").text

    total_reviews_text = driver.find_element_by_xpath(
        "//div[@class='review-score-container']//div//div//span//span[@class='z5jxId']").text
    num_reviews = int(total_reviews_text.split()[0])
    all_reviews = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#reviewSort .gws-localreviews__google-review')))
    # time.sleep(2)
    total_reviews = 0

    while total_reviews < num_reviews:
        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
        WebDriverWait(driver, 5, 0.25).until_not(EC.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
        
        all_reviews = WebDriverWait(driver, 5).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#reviewSort .gws-localreviews__google-review')))
        total_reviews = len(all_reviews)
        print(total_reviews, len(all_reviews))

    driver.get('https:ww.google.com') # or driver.close() if no bugs
    time.sleep(3)

driver.close()
driver.quit()

chrome 的解决方案似乎需要一些修复:

org.openqa.selenium.StaleElementReferenceException:过时的元素引用:元素未附加到页面文档

字面意思是，引用的元素已经过时，不再附加到当前页面。通常，这是因为页面被刷新或跳过，解决方法是，重新使用 findElement 或 findElements 方法来定位元素。

所以对于 chrome 来说似乎存在刷新问题，所以我建议在滚动之前加载记录数，以获得 DOM 项目的新副本，并且我必须在 while 结束时添加等待 1 秒循环

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
#from selenium.webdriver.firefox.options import Options
from selenium.webdriver.chrome.options import Options
import time

link_df =  [
    "https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318a3aa3041455:0x5f83f4fae76d8656,1,,,&rlfi=hd:;si:6882614014013965910,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEiglZKhm6qAgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSARJidXJtZXNlX3Jlc3RhdXJhbnSqAQwQASoIIgRmb29kKAA,y,UB2auy7TMYs;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]",
    "https://www.google.com/search?q=restaurant+in+christchurch&biw=1280&bih=614&hotel_occupancy=2&tbm=lcl&sxsrf=AOaemvI4qlEAr3btedb6PCx9U53RtXkI2Q%3A1635630947742&ei=Y799YaHfLOKZ4-EPoeqjmA4&oq=restaurant+in+christchurch&gs_l=psy-ab.3...0.0.0.614264.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.7jAOI05vCjI#lrd=0x6d318bf82139caaf:0xf115cd7fe794cbcc,1,,,&rlfi=hd:;si:17372017086881385420,l,ChpyZXN0YXVyYW50IGluIGNocmlzdGNodXJjaEjh9auu-q6AgAhaKBAAGAAYAiIacmVzdGF1cmFudCBpbiBjaHJpc3RjaHVyY2gqBAgDEACSAQpyZXN0YXVyYW50qgEMEAEqCCIEZm9vZCgA,y,ZeJbBWd7wDg;mv:[[-43.4870861,172.6509735],[-43.5490232,172.5976049]]"
]

i = 0
binaryfirefox = r'C:\Program Files (x86)\Mozilla Firefox\firefox.exe'
binarychrome = r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'


options = Options()

#cap = DesiredCapabilities().CHROME
#cap["marionette"] = True
#cap = DesiredCapabilities().FIREFOX
#options.binary = binaryfirefox
#driver = webdriver.Firefox(options=options, capabilities=cap, executable_path="E:\\Téléchargement\\geckodriver.exe")

options.binary_location  = binarychrome
driver = webdriver.Chrome(options=options, executable_path="E:\\Téléchargement\\chromedriver.exe" )

# same reason tha Firefox i have to load one time
# an url to accept manually the cookies
#driver.get(link_df[0])   



print(link_df)
for url in link_df:
    base_url = url    # i['Link']  # link_df['Link'][i]
    print(base_url)
    driver.get(base_url)
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//div[./span[text()='Newest']]"))).click()

    title = driver.find_element_by_xpath("//div[@class='P5Bobd']").text
    address = driver.find_element_by_xpath("//div[@class='T6pBCe']").text
    overall_rating = driver.find_element_by_xpath("//div[@class='review-score-container']//span[@class='Aq14fc']").text

    total_reviews_text = driver.find_element_by_xpath(
        "//div[@class='review-score-container']//div//div//span//span[@class='z5jxId']").text
    num_reviews = int(total_reviews_text.split()[0])
    all_reviews = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#reviewSort .gws-localreviews__google-review')))
    # time.sleep(2)
    total_reviews = 0

    while total_reviews < num_reviews:
        #reload to avoid exception, or trap scroll with try/except but more expznsive
        all_reviews = WebDriverWait(driver, 20).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#reviewSort .gws-localreviews__google-review')))

        driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])

        total_reviews = len(all_reviews)
        print(total_reviews, len(all_reviews))
        time.sleep(1)

    driver.get('https:ww.google.com') # or driver.close() if no bugs
    time.sleep(3)

driver.close()
driver.quit()

关于python - 从数据框中读取网络链接会引发 "stale element reference: element is not attached to the page document"错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69782860/

文章推荐： arrays - 在 BASH 中为单词数组添加边框

文章推荐： sas - 根据 SAS 中的第一次观察标记 ID

文章推荐： javascript - 我如何在 caporal js 中打印数字

文章推荐： html - 菜单网络链接跨度出现 2 倍，只希望它出现 1 倍？

javascript - “赞”按钮，在我的个人资料中发布时，描述将替换为“链接”“链接”“链接”
我有一个网站，并且我使用 javascript sdk 添加了“点赞”按钮。这是代码 (function(d, s, id) { var js, fjs = d.g
JavaScript 链接 VS CSS 链接
我知道 HTML 是逐行读取的。当您链接多个 css 文件(如规范化文件和样式表文件)时，由于 CSS 重要性特异性和源顺序，样式表文件应链接在规范化文件之后。看起来这不会影响链接的 JavaScri
css - 最常用的 Bootstrap CDN 链接/链接？
我正在使用官方 Bootstrap site 提供的 CDN 链接在我的网络应用程序中使用面板进行测试在彻底检查我的代码后，面板没有显示。但是我在 SO 上看到了类似的帖子并且 CDN 链接不同
html - 我的媒体查询不起作用。包括元标记；链接 rel 链接
这里是编码初学者。我正在尝试为我的移动设备网站设置断点，以便我的网站适合小屏幕。我只是想检查如果我缩小视口(viewport)的宽度，背景颜色是否会改变，但没有发生任何变化。也许我只是对一个简单的错误
javascript - JS - 对于字符串中具有特定 URL 的每个 anchor 链接，用其文本替换 anchor 链接
举一个我想要的例子，想象一下这个字符串: $text = 'lorem ipsum About us lorem ipsum'; 如果此字符串包含一个 href 以 / 开头的 anchor 链接，则
latex 链接
如何链接到 LaTeX 文档的另一部分或子部分？这种链接的常规范式是什么，像[链接名称]那样写，或者像网页超链接那样写？最佳答案链接到另一个部分需要您的部分进行一些额外的标记。要使用的命令是: \
MySQL 链接 WHERE IN
我有一个订单表，其中包含订单号、客户 ID 和代理 ID。然后有一个带有 id 的客户表和一个带有 id 的代理表。我需要获取所有具有来自代理 ID 'a03' 和代理 ID 'a05' 的订单的客
Python 链接
假设我有: dic = {"z":"zv", "a":"av"} ## Why doesn't the following return a sorted list of keys? keys = d
链接&编译后C执行错误
我在尝试链接到外部库时得到了一些奇怪的结果。如果我从命令行运行以下命令: gcc fftwTest.c -I../extlib/fftw-3.3.5-dll32 -L../extlib/fftw-3.
jQuery 链接
我认为我没有正确理解 jQuery 链接。我正在遍历一个数组并尝试将 div 元素添加到我的包装器 CSS 类中，每个 div 元素都有一个“click”类和自定义 css top 和 left 属性
HTML 链接
HTML 使用超级链接与网络上的另一个文档相连。几乎可以在所有的网页中找到链接。点击链接可以从一张页面跳转到另一张页面。 HTML 超链接（链接） HTML使用标签 a 来设置超文本链接。超链
底部和顶部页面的 HTML 链接
这个问题在这里已经有了答案: How do I link to part of a page? (hash?) (7 个答案) Scroll Automatically to the Bottom
Docker Swarm 链接
我想创建一个 Docker Swarm 集群，运行一个 Elasticsearch 实例、一个 MongoDB 实例和一个 grails 应用程序，每个都在单独的机器上。我正在使用 Docker Ma
CakePHP HTML 链接
我正在尝试将 CakePHP HTML Linker 用于以下代码 Add Cuisine 由于 span 标签需要在 a 标签内。我无法根据需要获得输出。关于如何完成它的任何建议？最佳答案禁用链
button - 免费应用程序中的捐赠按钮/链接
大家好，我最近开发了一个应用程序，很快就会提交到 App Store。我想免费提交这个应用程序，并想知道我是否可以实现一个带有 PayPal 捐赠标志的按钮，上面基本上写着“捐赠用于开发”或与此相关
d - 链接 libuv
我想尝试在 dlang 中使用 libuv。我下载了这样的 dlang 绑定(bind): git clone git@github.com:tamediadigital/libuv.git 现在我接
根据参数查看的 drupal 链接
我有一个节点(节点 a)，各种其他节点(节点 b/c/d/e)与之引用。我可以创建一个带有参数的 View 作为我正在查看的节点(节点 a)，并获取引用该节点的节点列表。基本上在节点 a 查看节点
同一页面中的 CakePHP 链接
我正在尝试建立一个常见问题页面，上面有目录，下面有答案。我想点击目录中的一个问题，并在同一页面上链接到相应的答案。我如何在 CakePHP 中使用 $this->Html->link() 执行此操作方
php - 在可变的单一产品页面中自定义添加到购物车按钮/链接
在 WooCommerce 3.0+ 中，我使用 js 创建了一些选项卡，每个选项卡中包含来自不同类别的产品。我已经设法修改了简单产品的添加到购物车链接，其中点击了 addtocart 按钮它进入下一
delphi - 是否可以包含基于组件属性的文件(链接)？
Delphi 2007/2009 奇怪的问题在这里: 根据设计时定义的组件属性，是否可以在链接中包含文件或保留文件？示例:如果我将 SomeProperty 保留为真，则在编译时，单元 SomeUn

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 从数据框中读取网络链接会引发 "stale element reference: element is not attached to the page document"错误