python - BaseSpider 和 CrawlSpider 的区别-6ren

python - BaseSpider 和 CrawlSpider 的区别

转载作者：太空狗更新时间：2023-10-29 20:16:48

我一直在努力理解在网络抓取中使用 BaseSpider 和 CrawlSpider 的概念。我读过 docs.但是BaseSpider上没有提及。如果有人能解释一下 BaseSpider 和 CrawlSpider 之间的区别，那将对我很有帮助。

最佳答案

BaseSpider 是以前存在的东西，现在已弃用(自 0.22 起)- 使用 scrapy.Spider 代替:

import scrapy

class MySpider(scrapy.Spider):
    # ...

scrapy.Spider是最简单的蜘蛛，它基本上会访问 start_urls 中定义的 URL 或 start_requests() 返回的 URL。

使用CrawlSpider当您需要“爬行”行为时 - 提取链接并关注它们:

This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules. It may not be the best suited for your particular web sites or project, but it’s generic enough for several cases, so you can start from it and override it as needed for more custom functionality, or just implement your own spider.

关于python - BaseSpider 和 CrawlSpider 的区别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32632001/

文章推荐： c# - 在 Roslyn 制作的编译中包含嵌入式资源

文章推荐： c++ - 会发生左值到右值的转换吗？

文章推荐： c# - Log4net SmtpAppender 不工作

文章推荐： c++ - 返回值 1.#INF000

python - Scrapy BaseSpider : How does it work?
这是 Scrapy 教程中的 BaseSpider 示例: from scrapy.spider import BaseSpider from scrapy.selector import HtmlX
python - BaseSpider 和 CrawlSpider 在一起
我想知道是否有一种方法可以在 scrapy 中的同一个蜘蛛中同时使用 Base 蜘蛛和 Crawl 蜘蛛! 假设我只想抓取 start_url 中提到的一个 url，然后对同一 start_url 中
python - BaseSpider 和 CrawlSpider 的区别
我一直在努力理解在网络抓取中使用 BaseSpider 和 CrawlSpider 的概念。我读过 docs.但是BaseSpider上没有提及。如果有人能解释一下 BaseSpider 和 Craw
python - 与 BaseSpider 一起使用的正则表达式会导致 CrawlSpider 出现错误
我在 Windows Vista 64 位上使用 Python.org 版本 2.7 64 位。我有以下代码，其中包含名为 Datastore.prime 的 Javascript 项目上的正则表达式

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - BaseSpider 和 CrawlSpider 的区别