gpt4 book ai didi

python - 为什么 Scrapy 会跳过循环?

转载 作者:太空宇宙 更新时间:2023-11-04 01:30:06 26 4
gpt4 key购买 nike

这个蜘蛛应该循环遍历 http://www.saylor.org/site/syllabus.php?cid=NUMBER ,其中 NUMBER 是 1 到 404 并提取每一页。但由于某种原因,它会跳过循环中的页面。许多页面。例如,它会跳过 1 到 16。有人能告诉我这是怎么回事吗?

代码如下:

 from scrapy.spider import BaseSpider
from scrapy.http import Request
from opensyllabi.items import OpensyllabiItem

import boto

class OpensyllabiSpider(BaseSpider):
name = 'saylor'
allowed_domains = ['saylor.org']
max_cid = 405
i = 1

def start_requests(self):
for self.i in range(1, self.max_cid):
yield Request('http://www.saylor.org/site/syllabus.php?cid=%d' % self.i, callback=self.parse_Opensyllabi)

def parse_Opensyllabi(self, response):
Opensyllabi = OpensyllabiItem()
Opensyllabi['url'] = response.url
Opensyllabi['body'] = response.body

filename = ("/root/opensyllabi/data/saylor" + '%d' % self.i)
syllabi = open(filename, "w")
syllabi.write(response.body)

return Opensyllabi

最佳答案

试试这个

class OpensyllabiSpider(BaseSpider):
name = 'saylor'
allowed_domains = ['saylor.org']
max_cid = 405

def start_requests(self):
for i in range(1, self.max_cid):
yield Request('http://www.saylor.org/site/syllabus.php?cid=%d' % i,
meta={'index':i},
callback=self.parse_Opensyllabi)

def parse_Opensyllabi(self, response):
Opensyllabi = OpensyllabiItem()
Opensyllabi['url'] = response.url
Opensyllabi['body'] = response.body


filename = ("/root/opensyllabi/data/saylor" + '%d' % response.request.meta['index'])
syllabi = open(filename, "w")
syllabi.write(response.body)

return Opensyllabi

关于python - 为什么 Scrapy 会跳过循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14346744/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com