gpt4 book ai didi

Python、Scrapy、管道 : function "process_item" not getting called

转载 作者:太空狗 更新时间:2023-10-29 20:51:36 27 4
gpt4 key购买 nike

我有一个非常简单的代码,如下所示。抓取没问题,我可以看到所有生成正确数据的 print 语句。在 Pipeline 中,初始化工作正常。但是,process_item 函数不会被调用,因为函数开头的 print 语句永远不会执行。

蜘蛛:comosham.py

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re


class ComoSham(Spider):
name = "comosham"
allowed_domains = ["www.comoshambhala.com"]
start_urls = [
"http://www.comoshambhala.com/singapore/classes/schedules",
"http://www.comoshambhala.com/singapore/about/location-contact",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
"http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
]

def parse(self, response):
category = (response.url)[39:44]
print 'in parse'
if category == 'class':
pass
"""self.gen_req_class(response)"""
elif category == 'about':
print 'about to call parse_location'
self.parse_location(response)
elif category == 'rates':
pass
"""self.parse_rates(response)"""
else:
print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


def parse_location(self, response):
print 'in parse_location'
item = ComoShamLocation()
item['category'] = 'location'
loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
item['pin'] = (loc[5])[11:18]
item['phone'] = (loc[9])[6:20]
item['fax'] = (loc[10])[6:20]
item['email'] = loc[12]
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
return item

项目文件:

import scrapy
from scrapy.item import Item, Field

class ComoShamLocation(Item):
address = Field()
pin = Field()
phone = Field()
fax = Field()
email = Field()
category = Field()

管道文件:

class ComoShamPipeline(object):
def __init__(self):
self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])


def process_item(self,item,spider):
print 'processing item now'
if item['category'] == 'location':
print item['address'],item['pin'],item['phone'],item['fax'],item['email']
self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
else:
pass

最佳答案

您的问题是您从未实际交出元素。 parse_location 返回要解析的项目,但解析永远不会产生该项目。

解决方案是替换:

self.parse_location(response)

yield self.parse_location(response)

更具体地说,如果没有生成任何项目,则永远不会调用 process_item。

关于Python、Scrapy、管道 : function "process_item" not getting called,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31331411/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com