gpt4 book ai didi

javascript - scrapy 填写 POST 表单

转载 作者:行者123 更新时间:2023-11-29 21:47:24 31 4
gpt4 key购买 nike

我正在尝试使用 scrapy 填写 POST 表单,以尝试预订火车票。

我以为 FormRequest 类可以做这件事,但我无法处理 javascript 表单。 Scrapy 爬虫什么都不返回。

我使用的文件足以发送表格。

import scrapy

from scrapy.item import Item, Field
from scrapy.http import FormRequest
from scrapy.spider import BaseSpider

class SncfItem(Item):
title = Field()
link = Field()
desc = Field()

class SncfSpider(scrapy.Spider):
name = "sncf"
allowed_domains = ["voyages-sncf.com"]
start_urls = (
'http://www.voyages-sncf.com/billet-train',
)

def parse(self, response):

yield FormRequest.from_response(response,
formname='saisie',
formdata={'ORIGIN_CITY': 'Gare de Lyon (Paris)',
'DESTINATION_CITY': 'Lyon Part-Dieu',
'OUTWARD_DATE': '03.06.2015'},
callback=self.parse1)

def parse1(self, response):
print response.status

如果我在使用 mySpider 时遗漏了一个步骤,谁能告诉我吗?

如有任何帮助,我们将不胜感激。
谢谢

scrapy crawl sncf -o items.xml  -t xml
2015-06-01 23:13:54+0200 [sncf] INFO: Spider opened
2015-06-01 23:13:54+0200 [sncf] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-06-01 23:13:54+0200 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6024
2015-06-01 23:13:54+0200 [scrapy] DEBUG: Web service listening on 127.0.0.1:6081
2015-06-01 23:13:55+0200 [sncf] DEBUG: Crawled (200) <GET http://www.voyages-sncf.com/billet-train> (referer: None)
2015-06-01 23:13:56+0200 [sncf] DEBUG: Redirecting (302) to <GET http://www.voyages-sncf.com/billet-train> from <POST http://www.voyages- sncf.com/vsc/train-ticket/>
2015-06-01 23:13:56+0200 [sncf] DEBUG: Crawled (200) <GET http://www.voyages-sncf.com/billet-train> (referer: http://www.voyages- sncf.com/billet-train)
200
2015-06-01 23:13:56+0200 [sncf] INFO: Closing spider (finished)

最佳答案

您将收到重定向,因为日期格式无效。

我在 Scrapy shell 中重放了该请求,执行以下操作:

$ scrapy shell http://www.voyages-sncf.com/billet-train
.... a few log messages later, I get a shell with the response...
>>> # first I recreate the FormRequest per your code:
>>> fr = FormRequest.from_response(response,
formname='saisie',
formdata={'ORIGIN_CITY': 'Gare de Lyon (Paris)',
'DESTINATION_CITY': 'Lyon Part-Dieu',
'OUTWARD_DATE': '03.06.2015'})
>>> # checked the url and method:
>>> fr.url
'http://www.voyages-sncf.com/vsc/train-ticket/'
>>> fr.method
'POST'
>>> fetch(fr) # execute the request
>>> view(response) # opened the result in browser

查看结果,我看到一条日期验证错误消息,内容为:“Le format de la date d'aller que vous avez saisi est incorrect. Nous vous invitons à utiliser le calendrier。”

关于javascript - scrapy 填写 POST 表单,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30599105/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com