gpt4 book ai didi

Scrapy 管道加载但不起作用

转载 作者:行者123 更新时间:2023-12-02 02:10:42 24 4
gpt4 key购买 nike

我有一个 Scrapy 项目,它加载管道但不向它们传递项目。感谢您的帮助。

蜘蛛的精简版:

#imports
class MySpider(CrawlSpider):
#RULES AND STUFF

def parse_item(self, response):
'''Takes HTML response and turns it into an item ready for database. I hope.
'''
#A LOT OF CODE
return item

此时打印出项目会产生预期的结果,settings.py 非常简单:

ITEM_PIPELINES = [
'mySpider.pipelines.MySpiderPipeline',
'mySpider.pipelines.PipeCleaner',
'mySpider.pipelines.DBWriter',
]

管道似乎是正确的(没有导入):

class MySpiderPipeline(object):
def process_item(self, item, spider):
print 'PIPELINE: got ', item['name']
return item

class DBWriter(object):
"""Writes each item to a DB. I hope.
"""
def __init__(self):
self.dbpool = adbapi.ConnectionPool('MySQLdb'
, host=settings['HOST']
, port=int(settings['PORT'])
, user=settings['USER']
, passwd=settings['PASS']
, db=settings['BASE']
, cursorclass=MySQLdb.cursors.DictCursor
, charset='utf8'
, use_unicode=True
)
print('init DBWriter')

def process_item(self, item, spider):
print 'DBWriter process_item'
query = self.dbpool.runInteraction(self._insert, item)
query.addErrback(self.handle_error)
return item

def _insert(self, tx, item):
print 'DBWriter _insert'
# A LOT OF UNRELATED CODE HERE
return item

class PipeCleaner(object):
def __init__(self):
print 'Cleaning these pipes.'

def process_item(self, item, spider):
print item['name'], ' is cleeeeaaaaannn!!'
return item

当我运行 spider 时,我在启动时得到了这个输出:

Cleaning these pipes.
init DBWriter
2012-10-23 15:30:04-0400 [scrapy] DEBUG: Enabled item pipelines: MySpiderPipeline, PipeCleaner, DBWriter

不像它们的 init 子句在爬虫启动时打印到屏幕上,process_item 方法不打印(或处理)任何东西。我祈祷我忘记了一些非常简单的事情。

最佳答案

2012-10-23 15:30:04-0400 [scrapy] DEBUG: Enabled item pipelines: MySpiderPipeline, PipeCleaner, DBWriter

此行表明您的管道正在初始化并且它们正常。

问题是你的爬虫类,

class MySpider(CrawlSpider):
#RULES AND STUFF

def parse_item(self, response):
'''Takes HTML response and turns it into an item ready for database. I hope.
'''
#A LOT OF CODE
# before returning item , print it
return item

我认为你应该打印一个项目,然后从 MySpider 返回它。

关于Scrapy 管道加载但不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13038383/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com