python - Selenium Python 脚本在 Windows 和 Ubuntu 环境中有不同的行为-6ren

python - Selenium Python 脚本在 Windows 和 Ubuntu 环境中有不同的行为

转载作者：太空宇宙更新时间：2023-11-03 16:53:10

我试过在 Windows 和 Ubuntu 上运行脚本，两者都使用 Python 3 和最新版本的 geckodriver，导致不同的行为。完整的脚本如下。

我正在尝试从备考网站获取多个不同测试的数据。有不同的科目，每个科目都有专业，每个科目都有练习测试，每个科目都有几个问题。 scrape 函数遍历了获取每种类型数据的步骤。

subject <--- specialization <---- practice-test *------ question

get_questions 函数是不同之处:

在 Windows 中，它的行为符合预期。单击最后一个问题的选择后，将转到结果页面。

在 Ubuntu 中，当在最后一个问题上单击一个选项时，它会重新加载最后一个问题并不断单击相同的选项并重新加载相同的问题。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pathlib
import time
import json
import os

driver=webdriver.Firefox(executable_path="./geckodriver.exe")
wait = WebDriverWait(driver, 15)
data=[]

def setup():

   driver.get('https://www.varsitytutors.com/practice-tests')
   try:
      go_away_1= driver.find_element_by_class_name("ub-emb-iframe")
      driver.execute_script("arguments[0].style.visibility='hidden'", go_away_1)
      go_away_2= driver.find_element_by_class_name("ub-emb-iframe-wrapper")
      driver.execute_script("arguments[0].style.visibility='hidden'", go_away_2)
      go_away_3= driver.find_element_by_class_name("ub-emb-visible")
      driver.execute_script("arguments[0].style.visibility='hidden'", go_away_3)
   except:
      pass

def get_subjects(subs=[]):
   subject_clickables_xpath="/html/body/div[3]/div[9]/div/*/div[@data-subject]/div[1]"
   subject_clickables=driver.find_elements_by_xpath(subject_clickables_xpath)
   subject_names=map(lambda x : x.find_element_by_xpath('..').get_attribute('data-subject'), subject_clickables)
   subject_pairs=zip(subject_names, subject_clickables)
   return subject_pairs

def get_specializations(subject):

   specialization_clickables_xpath="//div//div[@data-subject='"+subject+"']/following-sibling::div//div[@class='public_problem_set']//a[contains(.,'Practice Tests')]"
   specialization_names_xpath="//div//div[@data-subject='"+subject+"']/following-sibling::div//div[@class='public_problem_set']//a[contains(.,'Practice Tests')]/../.."
   specialization_names=map(lambda x : x.get_attribute('data-subject'), driver.find_elements_by_xpath(specialization_names_xpath))
   specialization_clickables = driver.find_elements_by_xpath(specialization_clickables_xpath)
   specialization_pairs=zip(specialization_names, specialization_clickables)
   return specialization_pairs

def get_practices(subject, specialization):
   practice_clickables_xpath="/html/body/div[3]/div[8]/div[3]/*/div[1]/a[1]"
   practice_names_xpath="//*/h3[@class='subject_header']"
   lengths_xpath="/html/body/div[3]/div[8]/div[3]/*/div[2]"
   lengths=map(lambda x : x.text, driver.find_elements_by_xpath(lengths_xpath))
   print(lengths)
   practice_names=map(lambda x : x.text, driver.find_elements_by_xpath(practice_names_xpath))
   practice_clickables = driver.find_elements_by_xpath(practice_clickables_xpath)
   practice_pairs=zip(practice_names, practice_clickables)
   return practice_pairs

def remove_popup():
   try:

      button=wait.until(EC.element_to_be_clickable((By.XPATH,"//button[contains(.,'No Thanks')]")))
      button.location_once_scrolled_into_view
      button.click()
   except:
      print('could not find the popup')

def get_questions(subject, specialization, practice):
   remove_popup()
   questions=[]
   current_question=None
   while True:
      question={}
      try:
         WebDriverWait(driver,5).until(EC.presence_of_element_located((By.XPATH,"/html/body/div[3]/div[7]/div[1]/div[2]/div[2]/table/tbody/tr/td[1]")))
         question_number=driver.find_element_by_xpath('/html/body/div[3]/div[7]/div[1]/div[2]/div[2]/table/tbody/tr/td[1]').text.replace('.','')
         question_pre=driver.find_element_by_class_name('question_pre')
         question_body=driver.find_element_by_xpath('/html/body/div[3]/div[7]/div[1]/div[2]/div[2]/table/tbody/tr/td[2]/p')
         answer_choices=driver.find_elements_by_class_name('question_row')
         answers=map(lambda x : x.text, answer_choices)
         question['id']=question_number
         question['pre']=question_pre.text
         question['body']=question_body.text
         question['answers']=list(answers)
         questions.append(question)
         choice=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"input.test_button")))
         driver.execute_script("arguments[0].click();", choice[3])
         time.sleep(3)
      except Exception as e:
         if 'results' in driver.current_url:
            driver.get(driver.current_url.replace('http://', 'https://'))
            # last question has been answered; record results
            remove_popup()
            pathlib.Path('data/'+subject+'/'+specialization).mkdir(parents=True, exist_ok=True)
            with open('data/'+subject+'/'+specialization+'/questions.json', 'w') as outfile:
               json.dump(list(questions), outfile)
               break
         else:
            driver.get(driver.current_url.replace('http://', 'https://'))
   return questions


def scrape():
   setup()
   subjects=get_subjects()
   for subject_name, subject_clickable in subjects:
      subject={}
      subject['name']=subject_name
      subject['specializations']=[]
      subject_clickable.click()
      subject_url=driver.current_url.replace('http://', 'https://')
      specializations=get_specializations(subject_name)
      for specialization_name, specialization_clickable in specializations:
         specialization={}
         specialization['name']=specialization_name
         specialization['practices']=[]
         specialization_clickable.click()
         specialization_url=driver.current_url.replace('http://', 'https://')
         practices=get_practices(subject_name, specialization_name)
         for practice_name, practice_clickable in practices:
            practice={}
            practice['name']=practice_name
            practice_clickable.click()
            questions=get_questions(subject_name, specialization_name, practice_name)
            practice['questions']=questions
            driver.get(specialization_url)
         driver.get(subject_url)
      data.append(subject)
   print(data)
scrape()

谁能帮我弄清楚是什么原因造成的？

最佳答案

这只是时机。在加载下一页之前，最后一个问题将比 3 秒 sleep 时间长得多。等待页面消失可以解决此问题并加快脚本执行速度。

  from selenium.common.exceptions import StaleElementReferenceException
<snip>
             choice=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"input.test_button")))
             choice[3].click()
             try:
                while choice[3].is_displayed():
                   time.sleep(1)
             except StaleElementReferenceException as e:
                continue

关于python - Selenium Python 脚本在 Windows 和 Ubuntu 环境中有不同的行为，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59077712/

文章推荐： python - 如何使用 Beautiful Soup 忽略空标签？

文章推荐： c# - 创建 Photoshop 风格的 float 工具调色板

文章推荐： PhpStorm被动监听请求

Grails 环境
我在文档中找不到答案，所以我在这里问。在 Grails 中，当您创建应用程序时，您会默认获得生产、开发等环境。如果您想为生产构建 WAR，您可以运行以下任一命令: grails war 或者 gr
Sitecore 环境
我们组织的网站正在迁移到 Sitecore CMS，但我们正在努力以某种方式为开发人员 (4)、设计师 (4)、QA 人员 (3)、作者 (10-15) 和批准者 (4-10) 设置环境在他们可以独立
CVSROOT 环境
如何在WinCVS中设置CVSROOT环境变量？最佳答案简单的回答是:您不需要。 CVSROOT 环境变量被高估了。 CVS(NT) 只会在确定存储库连接字符串的所有其他方法都已用尽时才使用它。人
haskell 环境
我最近完成了“learnyouahaskell”一书，现在我想通过构建 yesod 应用程序来应用我所学到的知识。但是我不确定如何开始。关于如何设置 yesod 项目似乎有两个选项。一是Stack
C# 环境
在这一章中，我们将讨论创建 C# 编程所需的工具。我们已经提到 C# 是 .Net 框架的一部分，且用于编写 .Net 应用程序。因此，在讨论运行 C# 程序的可用工具之前，让我们先了解一下 C#
03、Ruby 环境
运行Ruby 代码需要配置 Ruby 编程语言的环境。本章我们会学习到如何在各个平台上配置安装 Ruby 环境。各个平台上安装 Ruby 环境 Linux/Unix 上的 Ruby 安装
ide - 最佳移动应用程序开发工具/环境？
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the he
r - 返回和保存时如何清理函数闭包(环境)？
我有一个这样的计算(请注意，这只是非常简化的、缩减版的、最小的可重现示例!): computation <- function() # simplified version! { # a lo
R 环境/哈希表随着增长到数百万而变慢
我使用环境作为哈希表。键是来自常规文本文档的单词，值是单个整数(某个其他结构的索引)。当我加载数百万个元素时，更新和查找都变慢了。下面是一些代码来显示行为。看起来从一开始的行为在 O(n) 中比在
可重现的 saveRDS 环境
我正在构建一个 R 包并使用 data-raw和 data存储预定义的库 RxODE楷模。这非常有效。然而，由此产生的.rda文件每代都在变化。某些模型包含 R 环境，并且序列化似乎包含“创建时间”
Xcode 环境——快捷方式和缩进实用程序
(不确定问题是否属于这里，所以道歉是为了) 我很喜欢 Sublime Text ，我经常发现 Xcode 缺少一些文本/数据处理的东西。我可能有不止一个问题—— 'Command +/' 注释代码但没
Symfony - 仅在开发中定义路线。环境
我正在使用 SF2，并且创建了一些有助于项目调试的路由: widget_debug_page: path: /debug/widget/{widgetName} defau
django - conda 环境
我创建了一个名为 MyDjangoEnv 的 conda 环境。当我尝试使用 source activate MyDjangoEnv 激活它时，出现错误: No such file or direct
javascript - Cordova 环境
有没有办法区分从本地机器运行的包和从 Cordova 应用商店安装的包？例如，我想像这样设置一个名为“evn”的 JavaScript 变量: if(cordovaLocal){ env = 'de
足够困难地学习的 C 环境
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 1
Java 初学者网络开发工具包/环境
我的任务是使用 java 和 mysql 开发一个交互式网站:使用 servlet 检索和处理数据，applet 对数据客户端进行特殊处理，并处理客户端对不同数据 View 的请求。对于使用 jav
Linux 环境 -i 奇怪
这按预期工作: [dgorur@ted ~]$ env -i env [dgorur@ted ~]$ 这样做: [dgorur@ted ~]$ env -i which date which: no
R:列表中的快速哈希搜索(环境)
我想进行非常快速的搜索，看来使用哈希(通过环境)是最好的方法。现在，我得到了一个在环境中运行的示例，但它没有返回我需要的内容。这是一个例子: a system.time(benchEnv(), g
Windows 环境 OpenACC
我想开始开发 OpenACC 程序，我有几个问题要问:是否可以在 AMD gpu 上执行 OpenACC 代码？如果是这样，我正在寻找适用于 Windows 环境的编译器。我花了将近一个小时什么也没
Linux 环境。让机器变慢
这可能看起来很奇怪，但是有没有办法制作机器(linux/unix 风格 - 最好是 RHEL)。我需要控制机器的速度以确保代码在非常慢的系统上工作并确定正确的断点(在时间方面)。我能做到的一种方法是

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Selenium Python 脚本在 Windows 和 Ubuntu 环境中有不同的行为