gpt4 book ai didi

python - pycharm scrapy配置

转载 作者:太空宇宙 更新时间:2023-11-03 15:05:50 25 4
gpt4 key购买 nike

我是 scrapy 的新手,尝试使用 scrapy 配置 pycharm 。调试程序时出现错误。然后,我也尝试将我的Scrapy项目作为模型添加到PyCharm中,如下所示:

文件->设置->项目结构->添加内容根目录。它不起作用

import scrapy
from scrapy.spiders import SitemapSpider
from scrapy.spiders import Spider
from scrapy.http import Request, XmlResponse
from scrapy.utils.sitemap import Sitemap, sitemap_urls_from_robots
from scrapy.utils.gz import gunzip, is_gzipped
import re
import requests

class GetpagesfromsitemapSpider(SitemapSpider):
name = "test"
handle_httpstatus_list = [404]

def parse(self, response):
print response.url

def _parse_sitemap(self, response):
if response.url.endswith('/robots.txt'):
for url in sitemap_urls_from_robots(response.body):
yield Request(url, callback=self._parse_sitemap)
else:
body = self._get_sitemap_body(response)
if body is None:
self.logger.info('Ignoring invalid sitemap: %s', response.url)
return

s = Sitemap(body)
sites = []
if s.type == 'sitemapindex':
for loc in iterloc(s, self.sitemap_alternate_links):
if any(x.search(loc) for x in self._follow):
yield Request(loc, callback=self._parse_sitemap)
elif s.type == 'urlset':
for loc in iterloc(s):
for r, c in self._cbs:
if r.search(loc):
sites.append(loc)
break
print sites

def __init__(self, spider=None, *a, **kw):
super(GetpagesfromsitemapSpider, self).__init__(*a, **kw)
self.spider = spider
l = []
url = "https://channelstore.roku.com"
resp = requests.head(url + "/sitemap.xml")
if (resp.status_code != 404):
l.append(resp.url)
else:
resp = requests.head(url + "/robots.txt")
if (resp.status_code == 200):
l.append(resp.url)
self.sitemap_urls = l
print self.sitemap_urls

def iterloc(it, alt=False):
for d in it:
yield d['loc']


# Also consider alternate URLs (xhtml:link rel="alternate")
if alt and 'alternate' in d:
for l in d['alternate']:
yield l

错误:Error report

配置:Pycharm Configuration settings蜘蛛文件位置:location

最佳答案

配置> scipts参数> 爬行[蜘蛛名称]

在您的情况下,将[spider name]替换为test:爬行测试

更新

如果您不在 scrapy 项目中并且只是尝试运行单个文件,您可以使用 runspider [/file/path] 运行您的蜘蛛

在您的情况下:runspider items.py

关于python - pycharm scrapy配置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44680348/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com