python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg-6ren

python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg

转载作者：行者123 更新时间：2023-12-01 04:45:10

27

4

我试图在 scrappy 中抓取多个页面，我的函数确实返回第一个起始网址，但我无法设法使蜘蛛的规则生效。

这是我到目前为止所拥有的:

import scrapy

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from craigslist_sample.items import CraigslistSampleItem



class MySpider(CrawlSpider):
    name = "craigs"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/npo/"]



    rules = (
        Rule(SgmlLinkExtractor(allow=('.*?s=.*',), restrict_xpaths('a[@class="button next"]',)), callback='parse', follow=True),)

    def parse(self, response):
        for sel in response.xpath('//span[@class="pl"]'):
            item = CraigslistSampleItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            yield item`

我收到此错误

SyntaxError: non-keyword arg after keyword arg

更新:

感谢下面的回答。没有语法错误，但我的爬虫只是停留在同一页面，不爬行。

更新了代码

import scrapy

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from craigslist_sample.items import CraigslistSampleItem
from scrapy.contrib.linkextractors import LinkExtractor


class MySpider(CrawlSpider):
    name = "craigs"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/npo/"]

    rules = (Rule(SgmlLinkExtractor(allow=['.*?s=.*'], restrict_xpaths=('a[@class="button next"]')), 
        callback='parse', follow=True, ),
)


    def parse(self, response):
        for sel in response.xpath('//span[@class="pl"]'):
            item = CraigslistSampleItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            yield item

最佳答案

您的问题与此类似(Python 3)

>>> print("hello")
hello
>>> print("hello", end=",,")
hello,,
>>> print(end=",,", "hello")
SyntaxError: non-keyword arg after keyword arg

行:

Rule(SgmlLinkExtractor(allow=('.*?s=.*',), restrict_xpaths('a[@class="button next"]',)), callback='parse', follow=True),)

必须称为:

Rule(SgmlLinkExtractor(restrict_xpaths('a[@class="button next"]'),allow=('.*?s=.*',)), callback='parse', follow=True),)

关于python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29611518/

27

4

0

文章推荐： python - re 模块中的贪婪和贪婪

文章推荐： python - 与 Django 比较日期范围

文章推荐： jquery - 在toggleClass之后禁用和启用单击

javascript - 如何让 JavaScript 制作(制作)新页面？
我想在一个页面上做一个按钮，可以在同一页面调用一个JS函数。该函数将需要创建(打开)新窗口，其 HTML 代码由 JS 函数本身提供。我该怎么做？这样做的目的是从特定页面生成一个打印友好的页面。请
php - 项目一半用 mysql 制作，一半用 mysqli 制作
我一直在用 php 开发这个项目。该项目的一半是使用 mysql_query 完成的，最新的模块是使用 mysqli 制作的。有很多模块，我不想更改代码。如果是这样的话会不会产生问题。或者我应该将其全
c++ - "Could not determine which "制作 "command to run. Check the "制作 "step in the build configuration."Qt 创建者
我安装了好几次 qt creator，但它从来没有像我现在的 PC 那样花钱；首先，我使用我的 Pendrive(Qt 5.8 的)上一直有的安装程序，告诉我我无法下载一些存储库，我下载了相同安装程序
c++ - “Could not determine which ” 制作 “command to run. Check the ” 制作 “step in the build configuration.” Qt 创建者
我安装了 Qt Creator 5.10.1，当我构建项目时出现错误:“无法确定要运行哪个”make“命令。检查构建配置中的”make“步骤。”。我已经在另一台 PC 上安装了 Qt，我看到了这个问
scripting - 制作/制作文件进度指示!
看看这个 makefile，它有某种原始的进度指示(可能是一个进度条)。请给我建议/意见! # BUILD 最初是未定义的 ifndef 构建 # max 等于 256 个 x 十六:= x x x
jquery - 制作/改进图像预览的智能方法
这个问题会有点长，对此我很抱歉:) 我花了几天时间寻找最好的解决方案，以在 asp mvc 和 JQuery 中制作图像库。主要问题是当用户点击拇指时显示图像。我想让整个浏览器 View 变成黑色
Python 制作 list
我是Python方面的 super 高手。我一直在努力寻找适当的解决方案。这是列表，L = [0, 0, 0, 3, 4, 5, 6, 0, 0, 0, 0, 11, 12, 13, 14, 0, 0
c++ - 制作。异常行为
让我们考虑两个简化的 CMakeLists.txt set(GTEST "/usr/local/lib/libgtest.a") set(GMOCK "/usr/local/lib/libgmock.
c++ - 制作 Makefile
我如何制作 Makefile，因为这是按源代码分发程序的最佳方式。请记住，这是针对 C++ 程序的，而我是从 C 开发领域开始的。但是可以为我的 Python 程序制作 Makefile 吗？最佳答
haskell - 制作 Ord 类的新类型实例
由于 Ord 是 Eq 的子类，我发现很难理解创建该类的新类型实例的样子。我已经设法做到了: newtype NT1 = NT1 Integer instance Eq NT1 wh
powershell - 制作 PowerShell 所需的众多参数中的至少一个
在 PowerShell 中，我想编写一个函数，它接受不同的选项作为参数。没关系，如果它接收多个参数，但它必须接收至少一个参数。我想通过参数定义而不是之后的代码来强制执行它。我可以使用以下代码让它工作
heroku - 在没有手册页的情况下编译/制作 ffmpeg
我正在通过构建包使用 enable-ssl 在 heroku (ubuntu) 上安装 ffmpeg。我能够一直构建到这些错误: install: cannot create regular file
php - 制作 FFmpeg 缩略图？
我是 FFmpeg 的新手，但作为一个学习一些 mysql 数据库的项目，我正在尝试创建一个视频上传网站。当我尝试使用此代码制作缩略图时: shell_exec("/usr/local/bin/ff
libgdx - 制作 Actor 剪辑子图像
我想要一个绘制可绘制对象的 Actor ，但将其剪辑为 Actor 的大小。我从 Widget 派生这个类，并使用一些硬编码的值作为一个简单的测试: public class MyWidget ext
build - 制作 Erlang 版本的最佳实践是什么？
我一直在查看 Faxien+Sinan 和 Rebar，Erlang OTP 的基本理念似乎是，在单个 Erlang 镜像实例上安装应用程序和版本。保持发布自包含的最佳实践是什么？有没有办法打包发布，
svn - 制作 svn 存储库的独立副本
我正在尝试克隆存储库，但它应该是彼此独立的副本。这背后有什么魔法吗，或者只是使用 svn 客户端并克隆它？谢谢最佳答案试试 svnadmin hotcopy .您可以在 repo mainten
TYPO3 制作 2 级菜单
我想做一个这样的菜单: Item 1 Item 2 Item 3 Subitem 1 Subitem 2 但我得到了这个:
yii2 - 制作 Yii2 扩展时的最佳实践
为 Yii 创建扩展的最佳方式是什么？这是我到目前为止所做的我希望它可以通过 composer 安装，所以我为它创建了一个 github repo。我在文件夹 vendor/githubname
java - 制作 ActionListener 时遇到问题
我尝试制作一个ActionListener，但它给了我一个错误。我导入了事件，但它仍然不起作用。这是我的代码: send.addActionListener(new jj); private clas
jQuery 制作 HTML 的副本并存储它以供以后检索
我需要能够将 div 内的 HTML 代码恢复为页面就绪状态。我需要这个，因为我想在页面准备好后对 HTML 代码进行一些更改，然后在需要时将其恢复到页面准备好时的状态.. 我想使用克隆，但是如何只复

首页

博学

6Ren·AI

商城

python - 尝试用Python制作一个递归爬行蜘蛛。语法错误: non-keyword arg after keyword arg