gpt4 book ai didi

python - INFO : Crawled 0 pages (at 0 pages/min), 抓取了 0 件元素(0 件/分钟)

转载 作者:行者123 更新时间:2023-11-28 01:31:10 31 4
gpt4 key购买 nike

刚开始学习Python和Scrapy。我的第一个元素是在包含网络安全信息的网站上抓取信息。但是当我使用 cmd 运行它时,它说“抓取了 0 页(以 0 页/分钟),抓取了 0 个元素(以 0 元素/分钟)”并且似乎没有任何结果。如果有人能解决我的问题,我将不胜感激。

我的代码:

import scrapy

class SapoSpider(scrapy.Spider):
name = "imo"
allowed_domains = ["imovirtual.com"]
start_urls = ["https://www.imovirtual.com/arrendar/apartamento/lisboa/"]

def parse(self,response):
subpage_links = []
for i in response.css('div.offer-item-details'):
youritem = {
'preco':i.css('span.offer-item title::text').extract_first(),
'autor':i.css('li.offer-item-price::text').extract(),
'data':i.css('li.offer-item-area::text').extract(),
'data_2':i.css('li.offer-item-price-perm::text').extract()
}
subpage_link = i.css('header[class=offer-item-header] a::attr(href)').extract()
subpage_links.extend(subpage_link)

for subpage_link in subpage_links:
yield scrapy.Request(subpage_link, callback=self.parse_subpage, meta={'item':youritem})

def parse_subpage(self,response):
for j in response.css('header[class=offer-item-header] a::attr(href)'):
youritem = response.meta.get('item')
youritem['info'] = j.css(' ul.dotted-list, li.h4::text').extract()
yield youritem

最佳答案

要使其正常工作,需要纠正两件事:

这应该有效:

import scrapy


class SapoSpider(scrapy.Spider):
name = "imo"
allowed_domains = ["imovirtual.com"]
start_urls = ["https://www.imovirtual.com/arrendar/apartamento/lisboa/"]
custom_settings = {
'FEED_URI': './output.json'
}
def parse(self,response):
subpage_links = []
for i in response.css('div.offer-item-details'):
youritem = {
'preco':i.css('span.offer-item title::text').extract_first(),
'autor':i.css('li.offer-item-price::text').extract(),
'data':i.css('li.offer-item-area::text').extract(),
'data_2':i.css('li.offer-item-price-perm::text').extract()
}
subpage_link = i.css('header[class=offer-item-header] a::attr(href)').extract()
subpage_links.extend(subpage_link)

for subpage_link in subpage_links:
yield scrapy.Request(subpage_link, callback=self.parse_subpage, meta={'item':youritem})

def parse_subpage(self,response):
youritem = response.meta.get('item')
youritem['info'] = response.css(' ul.dotted-list, li.h4::text').extract()
yield youritem

关于python - INFO : Crawled 0 pages (at 0 pages/min), 抓取了 0 件元素(0 件/分钟),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50957522/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com