gpt4 book ai didi

python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg

转载 作者:行者123 更新时间:2023-12-01 04:45:10 27 4
gpt4 key购买 nike

我试图在 scrappy 中抓取多个页面,我的函数确实返回第一个起始网址,但我无法设法使蜘蛛的规则生效。

这是我到目前为止所拥有的:

import scrapy

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from craigslist_sample.items import CraigslistSampleItem



class MySpider(CrawlSpider):
name = "craigs"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/npo/"]



rules = (
Rule(SgmlLinkExtractor(allow=('.*?s=.*',), restrict_xpaths('a[@class="button next"]',)), callback='parse', follow=True),)

def parse(self, response):
for sel in response.xpath('//span[@class="pl"]'):
item = CraigslistSampleItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
yield item`

我收到此错误

SyntaxError: non-keyword arg after keyword arg

更新:

感谢下面的回答。没有语法错误,但我的爬虫只是停留在同一页面,不爬行。

更新了代码

import scrapy

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from craigslist_sample.items import CraigslistSampleItem
from scrapy.contrib.linkextractors import LinkExtractor


class MySpider(CrawlSpider):
name = "craigs"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/npo/"]

rules = (Rule(SgmlLinkExtractor(allow=['.*?s=.*'], restrict_xpaths=('a[@class="button next"]')),
callback='parse', follow=True, ),
)


def parse(self, response):
for sel in response.xpath('//span[@class="pl"]'):
item = CraigslistSampleItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
yield item

最佳答案

您的问题与此类似(Python 3)

>>> print("hello")
hello
>>> print("hello", end=",,")
hello,,
>>> print(end=",,", "hello")
SyntaxError: non-keyword arg after keyword arg

行:

Rule(SgmlLinkExtractor(allow=('.*?s=.*',), restrict_xpaths('a[@class="button next"]',)), callback='parse', follow=True),)

必须称为:

Rule(SgmlLinkExtractor(restrict_xpaths('a[@class="button next"]'),allow=('.*?s=.*',)), callback='parse', follow=True),)

关于python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29611518/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com