gpt4 book ai didi

python - scrapy __init__ arg 中的值错误

转载 作者:行者123 更新时间:2023-12-01 08:39:35 25 4
gpt4 key购买 nike

当我在cmd中编写此命令时

scrapy 抓取引号 -o item.csv -a u=test_user_name -a p=test_passporw_name -a urls= http://books.toscrape.com/

正在显示

引发 ValueError('请求 url 中缺少方案:%s' % self._url)ValueError:请求 url 中缺少方案:h

# -*- coding: utf-8 -*-
from scrapy.contrib.spiders.init import InitSpider
from scrapy.http import Request, FormRequest
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import Rule
from scrapy.utils.response import open_in_browser
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector


class QuotesSpider(InitSpider):
name = 'quotes'
allowed_domains = ['quotes.toscrape.com']
login_page='http://quotes.toscrape.com/login'
start_urls = ['']
username=''
password=''

def __init__(self,u,p,urls):
self.username=u
self.password=p
self.start_urls=urls




def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.login)

def login(self, response):
csrf_token=response.xpath('//*[@name="csrf_token"]//@value').extract_first()
return FormRequest.from_response(response,
formdata={'csrf_token': csrf_token,
'username': self.username,
'password': self.password,
},
callback=self.check_login_response)

def check_login_response(self, response):
# open_in_browser(response)
#"""Check the response returned by a login request to see if we aresuccessfully logged in."""
if "Logout" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..

return self.initialized() # ****THIS LINE FIXED THE LAST PROBLEM*****

else:
self.log("\n\n\nFailed, Bad times :(\n\n\n")
# Something went wrong, we couldn't log in, so nothing happens.

def parse(self, response):
open_in_browser(response)

最佳答案

self.start_urls=urls 使 start_urls 成为字符串而不是列表。
这使得该字符串中的每个字符都被解释为 url。

只需将 start_urls 改为列表,您的代码就应该可以工作:

self.start_urls = [urls]

此外,您不需要将变量初始化为虚拟值,也不需要自己解析 csrf_token(使用 FormRequest.from_response() 时会自动完成)


顺便说一句,您的代码看起来像是为相当旧的 scrapy 版本编写的 - 大多数导入已被移动、重命名或弃用。
也许您应该通过快速重读文档来刷新您的代码。

关于python - scrapy __init__ arg 中的值错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53569335/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com