gpt4 book ai didi

python - Scrapy空白断言错误?

转载 作者:太空宇宙 更新时间:2023-11-03 18:00:55 28 4
gpt4 key购买 nike

下面的代码对于发送到解析方法的每个请求都会抛出以下错误(Scrapy v0.24.4):

2014-12-30 01:20:06+0000 [yelp_spider] DEBUG: Crawled (200) <GET http://www.yelp.com/biz/lookout-tavern-oak-bluffs> (referer: http://www.yelp.com/search?find_desc=Restaurants&find_loc=02557&ns=1) ['partial']
2014-12-30 01:20:06+0000 [yelp_spider] ERROR: Spider error processing <GET http://www.yelp.com/biz/lookout-tavern-oak-bluffs>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/scrapy/core/scraper.py", line 111, in _scrape_next
self._scrape(response, request, spider).chainDeferred(deferred)
File "/usr/lib/python2.7/site-packages/scrapy/core/scraper.py", line 118, in _scrape
dfd = self._scrape2(response, request, spider) # returns spiders processed output
File "/usr/lib/python2.7/site-packages/scrapy/core/scraper.py", line 128, in _scrape2
request_result, request, spider)
File "/usr/lib/python2.7/site-packages/scrapy/core/spidermw.py", line 69, in scrape_response
dfd = mustbe_deferred(process_spider_input, response)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapy/utils/defer.py", line 39, in mustbe_deferred
result = f(*args, **kw)
File "/usr/lib/python2.7/site-packages/scrapy/core/spidermw.py", line 48, in process_spider_input
return scrape_func(response, request, spider)
File "/usr/lib/python2.7/site-packages/scrapy/core/scraper.py", line 138, in call_spider
dfd.addCallbacks(request.callback or spider.parse, request.errback)
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 288, in addCallbacks
assert callable(callback)
exceptions.AssertionError:

代码:

import scrapy
from scrapy import Request
import re

ROOT_URL = "http://www.yelp.com"

class YelpReview(scrapy.Item):
zip_code = scrapy.Field()
review_date = scrapy.Field()

class yelp_spider(scrapy.Spider):
name = 'yelp_spider'
allowed_domains = ['yelp.com']
start_urls = ["http://www.yelp.com/search?find_desc=Restaurants&find_loc=02557&ns=1"]

def parse(self, response):
business_urls = [business_url.extract() for
business_url in response.xpath('//a[@class="biz-name"]/@href')[1:]
]
for business_url in business_urls:
yield Request(url=ROOT_URL + business_url, callback="scrape_reviews")

if response.url.find('?start=') == -1:
self.createRestaurantPageLinks(response)

def scrape_reviews(self, response):
reviews = response.xpath('//meta[@itemprop="datePublished"]/@content')
item = YelpReview()

for review in reviews:
item['zip_code'] = "02557"
item['review_date'] = review.extract()
yield item

if response.url.find('?start=') == -1:
self.createReviewPageLinks(response)

def createRestaurantPageLinks(self, response):
raw_num_results = response.xpath('//span[@class="pagination-results-window"]/text()').extract()[0]
num_business_results = int(re.findall(" of (\d+)", raw_num_results)[0])
BUSINESSES_PER_PAGE = 10
restaurant_page_links = [Request(url=response.url + '?start=' + str(BUSINESSES_PER_PAGE*(n+1)),
callback="parse") for n in range(num_business_results/BUSINESSES_PER_PAGE)]

return restaurant_page_links

def createReviewsPageLinks(self, response):
REVIEWS_PER_PAGE = 40
num_review_results = int(response.xpath('//span[@itemprop="reviewCount"]/text()').extract()[0])
review_page_links = [Request(url=response.url + '?start=' + str(REVIEWS_PER_PAGE*(n+1)),
callback="scrape_reviews") for n in range(num_review_results/REVIEWS_PER_PAGE)]

return review_page_links

我尝试进行了一系列更改,但仍然无法弄清楚是什么触发了此错误。

最佳答案

您需要从 parse() 方法返回:

if response.url.find('?start=') == -1:
return self.createRestaurantPageLinks(response)

关于python - Scrapy空白断言错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27698440/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com