gpt4 book ai didi

python - python 中的selenium - 一次超时会导致所有后续请求超时

转载 作者:太空宇宙 更新时间:2023-11-03 21:48:06 25 4
gpt4 key购买 nike

Chrome 驱动程序版本:2.41Chrome版本:69.0.3497.92

这是我的代码,通过异常处理向一个网络驱动程序发送多个请求:

from selenium import webdriver
from selenium.common.exceptions import *

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')

driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)

for link in links:
try:
driver.get(link)
except TimeoutException as e:
# do something
continue
except Exception as e:
# do some other thing
continue

预期的行为是,如果抛出 TimeoutException,我将继续向下一个链接发出请求,依此类推。但是,我得到的是,当发生一个 TimeoutException 时,所有其余链接也会抛出 TimeoutException。

这是来自 Chrome 记录器的相关日志。

[1536872569.507][SEVERE]: Timed out receiving message from renderer: 29.449
[1536872569.509][INFO]: Timed out. Stopping navigation...
[1536872569.509][DEBUG]: DEVTOOLS COMMAND Page.stopLoading (id=1243) {

}
[1536872569.509][DEBUG]: DEVTOOLS RESPONSE Page.stopLoading (id=1243) {

}
[1536872569.509][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1244) {
"expression": "1"
}
[1536872569.510][SEVERE]: Timed out receiving message from renderer: -0.002
[1536872569.513][INFO]: Done waiting for pending navigations. Status: timeout
[1536872569.513][INFO]: RESPONSE Navigate timeout
(Session info: headless chrome=69.0.3497.92)
[1536872569.516][INFO]: COMMAND Navigate {
"sessionId": "9caf0bad68147065f14c9c22632cd6d8",
"url": "www.example.com"
}
[1536872569.516][DEBUG]: DEVTOOLS EVENT Page.frameStoppedLoading {
"frameId": "620369B66F0605C0CE359F34F9D95E36"
}
[1536872569.516][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1244) {
"result": {
"description": "1",
"type": "number",
"value": 1
}
}
[1536872569.516][INFO]: Waiting for pending navigations...
[1536872569.516][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1245) {
"expression": "1"
}
[1536872569.517][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1245) {
"result": {
"description": "1",
"type": "number",
"value": 1
}
}
[1536872599.516][SEVERE]: Timed out receiving message from renderer: 30.000
[1536872599.518][INFO]: Timed out. Stopping navigation...
[1536872599.518][DEBUG]: DEVTOOLS COMMAND Page.stopLoading (id=1246) {

}
[1536872599.518][DEBUG]: DEVTOOLS RESPONSE Page.stopLoading (id=1246) {

}
[1536872599.518][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1247) {
"expression": "1"
}
[1536872599.518][SEVERE]: Timed out receiving message from renderer: -0.002
[1536872599.522][INFO]: Done waiting for pending navigations. Status: timeout
[1536872599.522][INFO]: RESPONSE Navigate timeout
(Session info: headless chrome=69.0.3497.92)
[1536872599.524][INFO]: COMMAND Navigate {
"sessionId": "9caf0bad68147065f14c9c22632cd6d8",
"url": "www.example2.com"
}

以下是我将此事件与其他无任何异常完成的后续请求进行比较时发现的差异。

1) DEVTOOLS EVENT Page.frameStoppedLoading 在向新的“www.example.com”链接发送请求后立即发生。

2) 对从上一个链接发送的 DEVTOOLS COMMAND Runtime.evaluate (id=1244) 的响应会在对新 URL 的请求之后记录。

问题:除了每次发生 TimeoutException 时重新启动驱动程序之外,还有其他方法可以处理此问题吗?

如果有人也能详细说明这种行为,我将非常感激。谢谢。

最佳答案

更新:

通过进一步阅读日志,我意识到立即尝试发送另一个请求会导致请求根本无法发送。我在原来的帖子中提出的两个观察结果是在请求成功时发生的,因此您可以忽略它。

以下是成功的连续请求的日志与超时异常处理后的连续请求的日志的比较。

当 Chrome 驱动程序启动时,浏览器 session 会获取一个 id(后面称为frameId)。

   [1536915601.693][DEBUG]: DevTools request: http://localhost:34899/json
[1536915601.694][DEBUG]: DevTools response: [ {
"description": "",
"devtoolsFrontendUrl": "/devtools/inspector.html?ws=localhost:34899/devtools/page/A417CC5AE2C87A4D0FC64CF66B54ED72",
"id": "A417CC5AE2C87A4D0FC64CF66B54ED72",
"title": "data:,",
"type": "page",
"url": "data:,",
"webSocketDebuggerUrl": "ws://localhost:34899/devtools/page/A417CC5AE2C87A4D0FC64CF66B54ED72"
} ]


现在情况1:成功响应后的正常请求:

  [1536915607.033][INFO]: Done waiting for pending navigations. Status: ok
[1536915607.033][INFO]: RESPONSE GetSource "\u003C!DOCTYPE html>\u003Chtml xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ko\">\u003Chead>\u003Cmeta http-equiv=\"Content-Type\" content=\"text/h tml; charset=utf-8\" />\n\u003Cmeta name=\"viewport\" content=\"width=device-width, in..."
[1536915607.044][INFO]: COMMAND Navigate {
"sessionId": "d11fb86ec1b49a141f99fe1ec4286a85",
"url": "http://www.gelloy.com/product/detail.html?product_no=438&cate_no=30&display_group=1"
}
# ------ skip for concisiveness ----- #
[1536915607.044][INFO]: Done waiting for pending navigations. Status: ok
[1536915607.044][DEBUG]: DEVTOOLS COMMAND Page.navigate (id=49) {
"url": "http://www.gelloy.com/product/detail.html?product_no=438&cate_no=30&display_group=1"
}
[1536915609.244][DEBUG]: DEVTOOLS RESPONSE Page.navigate (id=49) {
"frameId": "A417CC5AE2C87A4D0FC64CF66B54ED72",
"loaderId": "0EB53CDA615428AA73A9DB67F5FF65E1"
}

在这里,我可以看到
- COMMAND Navigate - 准备下一个请求
- COMMAND Page.navigate - 发出请求
- RESPONSE Page.navigate - 返回开头给出的frameId

对比

情况 2:触发超时后立即发送请求:

  [1536872569.513][INFO]: Done waiting for pending navigations. Status: timeout
[1536872569.513][INFO]: RESPONSE Navigate timeout
(Session info: headless chrome=69.0.3497.92)
[1536872569.516][INFO]: COMMAND Navigate {
"sessionId": "9caf0bad68147065f14c9c22632cd6d8",
"url": "www.example.com"
}
[1536872569.516][DEBUG]: DEVTOOLS EVENT Page.frameStoppedLoading {
"frameId": "620369B66F0605C0CE359F34F9D95E36"
}
[1536872569.516][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1244) {
"result": {
"description": "1",
"type": "number",
"value": 1
}
}
[1536872569.516][INFO]: Waiting for pending navigations...
[1536872569.516][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1245) {
"expression": "1"
}
[1536872569.517][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1245) {
"result": {
"description": "1",
"type": "number",
"value": 1
}
}
[1536872599.516][SEVERE]: Timed out receiving message from renderer: 30.000

但是,超时后,我看到 COMMAND Navigate 以及要获取的下一个 url,但 COMMAND Page.navigate 从未发生。因此,当从创建 COMMAND Navigate 开始 30 秒后,驱动程序会根据最新的 RESPONSE Page.navigate 的结果确定页面是否已加载。此后会导致超时。

<小时/>

解决方案

我决定使用 driver.quit() 关闭驱动程序,并在每次发生超时异常时重新打开一个新浏览器。在继续循环之前放置一个 time.sleep(1) 似乎也有效,但我不能确定 1 秒是否足够。

这是我更新后的代码:

driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)

for link in links:
try:
driver.get(link)
except TimeoutException as e:
# do something
driver.quit()
driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)
continue
except Exception as e:
# do some other thing
continue

关于python - python 中的selenium - 一次超时会导致所有后续请求超时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52324331/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com