gpt4 book ai didi

python - Scrapy解析函数中未定义响应

转载 作者:太空宇宙 更新时间:2023-11-03 16:53:03 24 4
gpt4 key购买 nike

我正在尝试结合 Selenium 编写一个 Scrapy 蜘蛛来访问我正在抓取的页面上的一些 JavaScript 内容。我已成功使用 Selenium 打开页面并等待内容出现。现在我想从完全加载的页面构造一个 Scrapy TextResponse 。我的代码如下所示(我删除了 URL 和选择器字符串,它们并不重要):

import scrapy
from scrapy import signals
from scrapy.http import TextResponse
from scrapy.xlib.pydispatch import dispatcher

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class EexSpider(scrapy.Spider):
name = "eex"
allowed_domain = ["..."]
start_urls = ["..."]

def __init__(self):
self.driver = webdriver.Chrome()
dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
self.driver.close()

def parse(self, response):
self.driver.get(response.url)
wait = WebDriverWait(self.driver, 10)
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '...')))

# this is where things go wrong
print response.url # prints the correct url
text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
# NameError: name 'response' is not defined

当我运行爬网程序时,我在调用 TextResponse 构造函数的行中收到错误 NameError: name 'response' is not Defined 。奇怪的是,我能够在之前的行中成功打印 response.url

有人知道为什么会出现这种情况吗?

附注如果您想查看堆栈跟踪,请告诉我,我只是不想让问题显得更长。

免责声明:我是一个彻头彻尾的 Python 菜鸟;-)

最佳答案

检查包含 TextResponse 的行是否正确缩进。

例如,如果我有以下代码:

import scrapy
from scrapy import signals
from scrapy.http import TextResponse
from scrapy.xlib.pydispatch import dispatcher

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class EexSpider(scrapy.Spider):
name = "eex"
allowed_domain = ["google.com"]
start_urls = ["http://google.com"]

def __init__(self):
self.driver = webdriver.Chrome()
dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
self.driver.close()

def parse(self, response):
self.driver.get(response.url)

text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')

我遇到了完全相同的错误:

NameError: name 'response' is not defined

关于python - Scrapy解析函数中未定义响应,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35701608/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com