gpt4 book ai didi

mysql - Pipeline 不写入 MySQL,但也没有给出错误

转载 作者:行者123 更新时间:2023-11-29 18:28:07 24 4
gpt4 key购买 nike

我尝试实现 this pipeline在我的蜘蛛里。安装必要的依赖项后,我可以运行蜘蛛而不会出现任何错误,但由于某种原因它不会写入我的数据库。

我非常确定连接数据库时出现问题。当我输入错误的密码时,我仍然没有收到任何错误。

当蜘蛛抓取所有数据时,它需要几分钟才能开始转储统计信息。

2017-08-31 13:17:12 [scrapy] INFO: Closing spider (finished)
2017-08-31 13:17:12 [scrapy] INFO: Stored csv feed (27 items) in: test.csv
2017-08-31 13:24:46 [scrapy] INFO: Dumping Scrapy stats:

管道:

import MySQLdb.cursors
from twisted.enterprise import adbapi

from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.utils.project import get_project_settings
from scrapy import log

SETTINGS = {}
SETTINGS['DB_HOST'] = 'mysql.domain.com'
SETTINGS['DB_USER'] = 'username'
SETTINGS['DB_PASSWD'] = 'password'
SETTINGS['DB_PORT'] = 3306
SETTINGS['DB_DB'] = 'database_name'

class MySQLPipeline(object):

@classmethod
def from_crawler(cls, crawler):
return cls(crawler.stats)

def __init__(self, stats):
print "init"
#Instantiate DB
self.dbpool = adbapi.ConnectionPool ('MySQLdb',
host=SETTINGS['DB_HOST'],
user=SETTINGS['DB_USER'],
passwd=SETTINGS['DB_PASSWD'],
port=SETTINGS['DB_PORT'],
db=SETTINGS['DB_DB'],
charset='utf8',
use_unicode = True,
cursorclass=MySQLdb.cursors.DictCursor
)
self.stats = stats
dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
print "close"
""" Cleanup function, called after crawing has finished to close open
objects.
Close ConnectionPool. """
self.dbpool.close()

def process_item(self, item, spider):
print "process"
query = self.dbpool.runInteraction(self._insert_record, item)
query.addErrback(self._handle_error)
return item

def _insert_record(self, tx, item):
print "insert"
result = tx.execute(
" INSERT INTO matches(type,home,away,home_score,away_score) VALUES (soccer,"+item["home"]+","+item["away"]+","+item["score"].explode("-")[0]+","+item["score"].explode("-")[1]+")"
)
if result > 0:
self.stats.inc_value('database/items_added')

def _handle_error(self, e):
print "error"
log.err(e)

蜘蛛:

import scrapy
import dateparser
from crawling.items import KNVBItem

class KNVBspider(scrapy.Spider):
name = "knvb"
start_urls = [
'http://www.knvb.nl/competities/eredivisie/uitslagen',
]
custom_settings = {
'ITEM_PIPELINES': {
'crawling.pipelines.MySQLPipeline': 301,
}
}
def parse(self, response):
# www.knvb.nl/competities/eredivisie/uitslagen
for row in response.xpath('//div[@class="table"]'):
for div in row.xpath('./div[@class="row"]'):
match = KNVBItem()
match['home'] = div.xpath('./div[@class="value home"]/div[@class="team"]/text()').extract_first()
match['away'] = div.xpath('./div[@class="value away"]/div[@class="team"]/text()').extract_first()
match['score'] = div.xpath('./div[@class="value center"]/text()').extract_first()
match['date'] = dateparser.parse(div.xpath('./preceding-sibling::div[@class="header"]/span/span/text()').extract_first(), languages=['nl']).strftime("%d-%m-%Y")
yield match

如果有更好的管道可以完成我想要实现的目标,那也是受欢迎的。谢谢!

更新:通过接受的答案中提供的链接,我最终得到了这个正在运行的函数(从而解决了我的问题):

def process_item(self, item, spider):
print "process"
query = self.dbpool.runInteraction(self._insert_record, item)
query.addErrback(self._handle_error)
query.addBoth(lambda _: item)
return query

最佳答案

看看at this了解如何将 adbapi 与 MySQL 一起使用来保存抓取的项目。请注意您的 process_item 及其 process_item 方法实现的差异。当您立即返回项目时,它们会返回 Deferred 对象,该对象是 runInteraction 方法的结果,并在完成时返回项目。我认为这就是您的 _insert_record 永远不会被调用的原因。

关于mysql - Pipeline 不写入 MySQL,但也没有给出错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45980896/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com