python - phantomjs 引发 OSError : [Errno 9] Bad file descriptor-6ren

python - phantomjs 引发 OSError : [Errno 9] Bad file descriptor

转载作者：太空宇宙更新时间：2023-11-03 14:52:23

35

4

当我在Scrapy中间件中使用phantomjs时，有时会引发:

Traceback (most recent call last):
 File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-
packages/scrapy/core/downloader/middleware.py", line 37, in 
process_request
response = yield method(request=request, spider=spider)
File "/home/ttc/ruyi-
scrapy/saibolan/saibolan/hz_webdriver_middleware.py", line 47, in 
 process_request
driver.quit()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 76, in quit
self.service.stop()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 149, in stop
self.send_remote_shutdown_command()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/service.py", line 67, in send_remote_shutdown_command
os.close(self._cookie_temp_file_handle)
OSError: [Errno 9] Bad file descriptor

其实不是每次都出现，我爬了80页，出现了30次，而且是在phantomjs中间件中

class HZPhantomjsMiddleware(object):

def __init__(self, settings):
    self.phantomjs_driver_path = settings.get('PHANTOMJS_DRIVER_PATH')
    self.cloud_mode = settings.get('CLOUD_MODE')

@classmethod
def from_crawler(cls, crawler):
    return cls(crawler.settings)

def process_request(self, request, spider):
    # 线上需要 display， 本地调试可以注释掉
    # if self.cloud_mode:
    #     display = Display(visible=0, size=(800, 600))
    #     display.start()
    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap["phantomjs.page.settings.userAgent"] = (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36")
    driver = webdriver.PhantomJS(
        self.phantomjs_driver_path, desired_capabilities=dcap)
    # chrome_options = webdriver.ChromeOptions()
    # prefs = {"profile.managed_default_content_settings.images": 2}
    # chrome_options.add_experimental_option("prefs", prefs)
    # driver = webdriver.Chrome(self.chrome_driver_path, chrome_options=chrome_options)
    driver.get(request.url)
    try:
        element = WebDriverWait(driver, 15).until(
            ec.presence_of_element_located(
                (By.XPATH, '//div[@class="txt-box"]|//h4[@class="weui_media_title"]|//div[@class="rich_media_content "]'))
        )
        body = driver.page_source
        time.sleep(1)
        driver.quit()
        return HtmlResponse(request.url, body=body, encoding='utf-8', request=request)
    except:
        driver.quit()
        spider.logger.error('Ignore request, url: {}'.format(request.url))
        raise IgnoreRequest()

我不知道是什么导致了这个错误。

最佳答案

截至 2016 年 7 月，driver.close() 和 driver.quit() 对我来说还不够。这杀死了节点进程，但没有杀死它产生的 phantomjs 子进程。

在关于 this GitHub issue 的讨论之后，对我有用的唯一解决方案是运行:

import signal

driver.service.process.send_signal(signal.SIGTERM) # kill the specific phantomjs child proc
driver.quit()                                      # quit the node proc

关于python - phantomjs 引发 OSError : [Errno 9] Bad file descriptor，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44995042/

35

4

0

文章推荐： c# - 如何在 C# 中检索证书信息以进行客户端身份验证

文章推荐： python - soup.find_all 有效但 soup.select 无效

文章推荐： c# - Control.Enabled 是如何工作的？

file - access to file to files tomcat的conf文件夹下的一个文件
我想知道是否可以访问放在 tomcat 的 conf 文件夹中的文件。通常我会在这个文件中放置多个 webapp 的配置，在 war 之外。我想使用类路径独立于文件系统。我过去使用过 lib 文件
PowerShell ForEach $file in $Files 中的每个 $file
我有一个 PowerShell 脚本，它获取文件列表并移动满足特定条件的文件。为什么即使对象为空，foreach 循环也会运行？我假设如果 $i 不存在，它就不会运行。但是如果 $filePath
java - File file = new File () 的路径错误
我已将 BasicAccountRule.drl 放置在我的 Web 应用程序中，位置为:C:/workspace/exim_design/src/main/resources/rules/drl/i
ruby - File.open ('file.txt' ) 与 File.open ('file.txt' ).readlines
我使用 File.open('file.txt').class 和 File.open('file.txt').readlines.class 以及前者进行了检查一个返回 File，后者返回 Arra
java - 即使 file.exists()、file.canRead()、file.canWrite()、file.canExecute() 都返回 true，file.delete() 也会返回 false
我正在尝试使用 FileOutputStream 删除文件，在其中写入内容后。这是我用来编写的代码: private void writeContent(File file, String fileC
python - FileNotFoundException :File file:/path/to/file/in. txt不存在或者运行Flink的用户没有足够的权限访问它
我正在尝试使用 flink 和 python 批处理 api 测试 Wordcount 经典示例。我的问题是，将数据源从 env.from_elements() 修改为 env.read_text()
c - 通过函数 : FILE* or FILE**? 的 FILE* 数组
我正在尝试制作一个可以同时处理多个不同文件的程序。我的想法是制作一个包含 20 个 FILE* 的数组，以便在我达到此限制时能够关闭其中一个并打开请求的新文件。为此，我想到了一个函数，它选择一个选项
linux - 狂欢 : Search Contents of File A in File B and Print lines of File A in File C
我有两个文件A和B文件A: 976464 792992 文件B TimeStamp,Record1,976464,8383,ABCD 我想搜索文件 A 和文件 B 中的每条记录并打印匹配的记录。打印的
java - 使用 Java 8 流将 Map 转换为 Map>
我有一些保存在 map 中的属性文件。示例: Map map = new HashMap<>(); map.put("1", "One"); map.put("2", "Two"); map.put(
file - Unix/庆典 : Reading A List of Files and Merge Them To A File
我正在尝试找出一个脚本文件，该文件接受一个包含文件列表的文件(每一行都是一个文件路径，即 path/to/file)并将它们合并到一个文件中。例如: list.text -- path/to/fil
c# - File.CreateText/File.AppendText 与 File.AppendAllText
为了使用 File.CreateText() 和 File.AppendText() 你必须: 通过调用这些方法之一打开流写消息关闭流处理流为了使用 File.AppendAllText()
Using rsync to rename files during copying with --files-from?(在复制过程中使用rsync重命名文件--files-from？)
使用rsync时，如何在使用--files-from参数复制时重命名文件？我有大约190，000个文件，在从源复制到目标时，每个文件都需要重命名。我计划将文件列表放在一个文本文件中传递给--files
java - "file:d:\\dir1\file.xml"和 "file:/d:\\dir1\file.xml"作为 FileSystemXmlApplicationContext 参数
我在非服务器应用程序中使用 Spring(只需从 Eclipse 中某个类的 main() 编译并运行它)。我的问题是作为 new FileSystemXmlApplicationContext 的
ksh - "test -a file"和 "test file -ef file"的区别
QNX (Neutrino 6.5.0) 使用 ksh 的开源实现作为其 shell 。许多提供的脚本，包括系统启动脚本，都使用诸如 if ! test /dev/slog -ef /dev/slog
PHP : Excel cannot open the file because the file format or file extension is not valid
当我尝试打开从我的应用程序下载的 xls 文件时，出现此错误: excel cannot open the file because the file format or file extension
c - "file pointer"、 "stream"、 "file descriptor"和... "file"之间的区别？
有一些相关的概念，即文件指针、流和文件描述符。我知道文件指针是指向数据类型 FILE 的指针(在例如 FILE.h 和 struct_FILE.h 中声明)。我知道文件描述符是 int ，例如成员
file - Groovy(文件IO): find all files and return all files - the Groovy way
好吧，这应该很容易... 我是groovy的新手，我希望实现以下逻辑: def testFiles = findAllTestFiles(); 到目前为止，我想出了下面的代码，该代码可以成功打印所有文
PowerShell:为什么 "Get-Content | Out-File -Append "会进入循环？
我理解为什么以下内容会截断文件的内容: Get-Content | Out-File 这是因为 Out-File 首先运行，它会在 Get-Content 有机会读取文件之前清空文件。但是当我尝
file - 类型错误 : invalid file: When trying to make a file name a variable
您好，我正在尝试将文件位置表示为变量，因为最终脚本将在另一台机器上运行。这是我尝试过的代码，然后是我得到的错误。在我看来，python 是如何添加“\”的，这就是导致问题的原因。如果是这种情况，我如何
bash - 一行文件的 "$(cat file)"、 "$(
我有一个只包含一行的输入文件: $ cat input foo bar 我想在我的脚本中使用这一行，据我所知有 3 种方法: line=$(cat input) line=$( input"...,

首页

博学

6Ren·AI

商城

python - phantomjs 引发 OSError : [Errno 9] Bad file descriptor