gpt4 book ai didi

python - Selenium 在执行一段时间后为所有网站提供 "Timed out receiving message from renderer"

转载 作者:行者123 更新时间:2023-12-03 14:37:50 27 4
gpt4 key购买 nike

我有一个应用程序,我需要一个长时间运行的实例 Selenium Web 驱动程序(我在 headless 模式下使用 Chrome driver 83.0.4103.39)。基本上,该应用程序不断从队列中提取 url-data,并将提取的 url 提供给 Selenium,Selenium 应该在网站上执行一些分析。其中许多网站可能已关闭、无法访问或损坏,因此我将页面加载超时设置为 10 秒,以避免 Selenium 永远等待页面加载。
我在这里遇到的问题是,在一些执行时间(假设 10 分钟)后,Selenium 开始给出 Timed out receiving message from renderer每个 url 的错误。最初它工作正常,它可以正确打开好的网站并在坏网站上超时(网站无法加载),但一段时间后它开始对所有内容进行超时,即使是应该正确打开的网站(我已经检查过,它们在 Chrome 浏览器上正确打开)。
我很难调试这个问题,因为应用程序中的每个异常都被正确捕获。我也注意到这个问题只发生在 headless模式。

  • 更新 *
    在网站分析期间,我还需要考虑 iframe(仅顶级),因此我还添加了一个逻辑来将驱动程序上下文切换到主页中的每个 iframe 并提取相关的 html。

  • 这是应用程序的简化版本:
    import traceback
    from time import sleep
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By

    width = 1024
    height = 768

    chrome_options = Options()
    chrome_options.page_load_strategy = 'normal'
    chrome_options.add_argument('--enable-automation')
    chrome_options.add_argument('disable-infobars')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--lang=en')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--allow-insecure-localhost')
    chrome_options.add_argument('--allow-running-insecure-content')
    chrome_options.add_argument('--disable-notifications')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-browser-side-navigation')
    chrome_options.add_argument('--mute-audio')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--force-device-scale-factor=1')
    chrome_options.add_argument(f'window-size={width}x{height}')

    chrome_options.add_experimental_option(
    'prefs', {
    'intl.accept_languages': 'en,en_US',
    'download.prompt_for_download': False,
    'download.default_directory': '/dev/null',
    'automatic_downloads': 2,
    'download_restrictions': 3,
    'notifications': 2,
    'media_stream': 2,
    'media_stream_mic': 2,
    'media_stream_camera': 2,
    'durable_storage': 2,
    }
    )

    driver = webdriver.Chrome(options=options)
    driver.set_page_load_timeout(10) # Timeout 10 seconds

    # Polling queue
    while True:
    url = queue.pop()

    # Try open url
    try:
    driver.get(url)
    except BaseException as e:
    print(e)
    print(traceback.format_exc())
    continue

    # Take website screenshot
    png = driver.get_screenshot_as_png()

    # Extract html from iframes (if any)
    htmls = [driver.page_source]
    iframes = driver.find_elements_by_xpath("//iframe")

    for index, iframe in enumerate(iframes):
    try:
    driver.switch_to.frame(index)
    htmls.append(driver.page_source)
    driver.switch_to.default_content()
    except BaseException as e:
    print(e)
    print(traceback.format_exc())
    continue

    # Do some analysis
    for html in htmls:
    # ...
    pass

    # Wait a bit
    sleep(0.1)
    这是堆栈跟踪的示例:
    Opening https://www.yourmechanic.com/user/appointment/3732777/?access_token=HLZYIg&ukey=6quWpg1724633&rcode=abttgi&utm_medium=sms&utm_source= rb
    LOAD EXCEPTION Message: timeout: Timed out receiving message from renderer: 10.000
    (Session info: headless chrome=83.0.4103.116)

    Traceback (most recent call last):
    File "/Users/macbmacbookpro4ookpro4/Documents/Projects/python/proj001/main.py", line 202, in inference
    driver.get(url)
    File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
    File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
    File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
    selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 10.000
    (Session info: headless chrome=83.0.4103.116)
    有没有人知道为什么在正确执行一段时间后,Selenium 开始为它尝试打开的任何 url 提供超时异常?

    最佳答案

    这个错误信息...

    selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 10.000
    ...暗示 ChromeDriver 无法与浏览上下文(即 Chrome 浏览器 session )进行通信。

    深潜
    由于多种原因,可能会出现此错误。其中几个原因和补救措施如下:
  • disable-infobars--enable-automation几乎类似和 disable-infobars不再维护。 --enable-automation将服务于您的目的。所以你需要放弃:
    chrome_options.add_argument('disable-infobars')

  • You can find a detailed discussion in Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76


  • --enable-automation仍然是experimental_option所以你需要:
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])

  • You can find a detailed discussion in How can I use setExperimentalOption through Options using FirefoxDriver in Selenium IDE?


  • 如果您打算使用 --enable-automation您需要使用 useAutomationExtension还有:
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
  • --disable-gpu不再需要,所以你需要删除:
    chrome_options.add_argument('--disable-gpu')

  • You can find a detailed discussion in Chrome Options in Python Selenium : Disable GPU vs Headless


  • 您可以选择使用更大的 Viewport通过 {width}x{height}例如1920, 1080
    chrome_options.add_argument("window-size=1920,1080")

  • You can find a detailed discussion in How to set window size in Selenium Chrome Python


  • 发起 而不是 chrome_options.add_argument('--headless')您需要使用 headless属性如下:
    chrome_options.headless = True

  • You can find a detailed discussion in How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?


  • 正如你所拥有的 enumerated所有元素,值得一提的是你不能switch_to所有的<iframe>/<frame>因为其中一些可能将 style 属性值设置为 display: none; .

  • You can find a detailed discussion in Expected condition failed: waiting for element to be clickable for element containing style=“display: none;”


  • 最后,切换到你需要诱导WebDriverWait对于所需的 frame to be available and switch to it() 如下:
    WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))

  • You can find a couple of relevant discussions in:



    引用
    您可以在 Timed out receiving message from renderer 上找到一些相关的详细讨论。在:
  • Timed out receiving message from renderer: 10.000
  • How to handle “Unable to receive message from renderer” in chrome driver?
  • Timed out receiving message from renderer
  • 关于python - Selenium 在执行一段时间后为所有网站提供 "Timed out receiving message from renderer",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62889739/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com