gpt4 book ai didi

python - scrapy +couchebase : middleware or pipline?如何存储和检索数据

转载 作者:行者123 更新时间:2023-12-01 05:20:41 25 4
gpt4 key购买 nike

我想使用scrapycouchbase 结合使用存储/检索数据。

为了存储和检索我的数据,我对采用的解决方案感到困惑:

  1. 我应该实现管道吗?

我的意思是:

    Class CouchbasePipeline(object):
def __init__(self):
## init client here using settings

def process_item(self, item, spider):
## store item here
  1. 或者我应该实现一个下载器中间件?

类似于:

Class CouchBaseCacheStorage(object):

def __init__(self, settings):
## init client here using settings

def get_response(self, spider, request):
pass


def save_response(self, spider, request, response):
pass

或者也许我应该同时实现两者? (管理缓存/数据库)。

我真的很困惑,特别是我是 python/couchebase/scrapy 的新手?我的问题不是关于做事情的最佳实现/工具,而是更多关于做这些scrapy东西的标准方法,因为我在源文档或网络上找不到这个。

提前感谢您的帮助。

最佳答案

@agstudy 发布答案后的代码建议。

见下文:

from scrapy import signals
from couchbase.exceptions import CouchbaseError
from couchbase import Couchbase

class CouchbaseStore(object):

@classmethod
def from_crawler(cls, crawler):
o = cls(crawler.settings)
crawler.signals.connect(o.spider_opened, signal=signals.spider_opened)
crawler.signals.connect(o.spider_opened, signal=signals.spider_opened)
return o

def __init__(self, settings):
self._server = settings.get('COUCHBASE_SERVER')
self._port = settings.get('COUCHBASE_PORT', 8091)
self._bucket = settings.get('COUCHBASE_BUCKET')
self._password = settings.get('COUCHBASE_PASSWORD')

def process_item(self, item, spider):
data = {}
for key in item.keys():
if isinstance(item[key], datetime):
data[key] = item[key].isoformat()
else:
data[key] = item[key]
## I assume item have a unique time field
key = "{0}".format(item['time'].isoformat())
self.couchbase.set(key, data)
log.msg("Item with key % s stored in bucket %s/ node %s" %
(key, self._bucket, self._server),
level=log.INFO, spider=spider)
return item

def spider_opened(self, spider):
try:
self.couchbase = Couchbase.connect(bucket=self._bucket,
host=self._server,
post=self._port,
password=self._password)
except CouchbaseError:
log.msg('Connection problem to bucket %s'% self._bucket,
log.ERROR)
log.msg("CouchbaseStore.spider_opened called", level=log.DEBUG)

def spider_closed(self, spider):
self.couchbase._close()
log.msg("CouchbaseStore.spider_closed called", level=log.DEBUG)

关于python - scrapy +couchebase : middleware or pipline?如何存储和检索数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22467111/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com