gpt4 book ai didi

python - Django-dynamic-scraper 无法抓取数据

转载 作者:行者123 更新时间:2023-12-01 04:20:17 25 4
gpt4 key购买 nike

我是使用动态抓取工具的新手,我使用了以下示例来学习 open_news 。我已完成所有设置,但它让我显示相同的错误:dynamic_scraper.models.DoesNotExist:RequestPageType 匹配查询不存在。

2015-11-20 18:45:11+0000 [article_spider] ERROR: Spider error processing <GET https://en.wikinews.org/wiki/Main_page>
Traceback (most recent call last):
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/base.py", line 825, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 645, in _tick
taskObj._oneWorkUnit()
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 491, in _oneWorkUnit
result = next(self._iterator)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback
yield next(it)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output
for x in result:
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/spiders/django_spider.py", line 378, in parse
rpt = self.scraper.get_rpt_for_scraped_obj_attr(url_elem.scraped_obj_attr)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/models.py", line 98, in get_rpt_for_scraped_obj_attr
return self.requestpagetype_set.get(scraped_obj_attr=soa)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/manager.py", line 127, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/query.py", line 334, in get
self.model._meta.object_name
dynamic_scraper.models.DoesNotExist: RequestPageType matching query does not exist.

最佳答案

这是由于缺少“请求页面类型”造成的。每个“SCRAPER ELEMS”必须有自己的“请求页面类型”。

要解决此问题,请按照以下步骤操作:

  1. 登录管理页面(通常 http://localhost:8000/admin/ )
  2. 返回首页 › Dynamic_Scraper › 抓取工具 › 维基新闻抓取工具(文章)
  3. 点击“请求页面类型”下的“添加其他请求页面类型”
  4. 为每个“(base (Article))”、“(title (Article))”、“(description (Article))”和“(url (Article))”总共创建 4 个“请求页面类型”

“请求页面类型”设置

所有“内容类型”均为“HTML”

所有“请求类型”均为“请求”

所有“方法”都是“获取”

对于“页面类型”,只需按顺序分配它们即可

(基础(文章))|主页

(标题(文章))|详情页1

(描述(文章)| 详细信息页面 2

(url(文章))|详情页3

完成上述步骤后,您应该修复“DoesNotExist:RequestPageType”错误。

但是,“错误:强制 elem 标题丢失!”会出现的!

解决这个问题。我建议您将“SCRAPER ELEMS”中的所有“请求页面类型”更改为“主页”,包括“标题(文章)”。

然后按如下方式更改 XPath:

(基础(文章))|//td[@class="l_box"]

(标题(文章))|跨度[@class="l_title"]/a/@title

(描述(文章)| p/span[@class="l_summary"]/text()

(url(文章))|跨度[@class="l_title"]/a/@href

毕竟,运行scrapy crawl article_spider -a id=1 -a do_action=yes在命令提示符下。您应该能够抓取“文章”。您可以从首页 › Open_News › 文章查看

享受吧~

关于python - Django-dynamic-scraper 无法抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33833986/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com