python使用scrapy发送post请求的坑-6ren

python使用scrapy发送post请求的坑

转载作者：qq735679552 更新时间：2022-09-27 22:32:09

CFSDN坚持开源创造价值，我们致力于搭建一个资源共享平台，让每一个IT人在这里找到属于你的精彩世界.

这篇CFSDN的博客文章python使用scrapy发送post请求的坑由作者收集整理，如果你对这篇文章有兴趣，记得点赞哟.

使用requests发送post请求。

先来看看使用requests来发送post请求是多少好用，发送请求。

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如，你可以这样发送一个 HTTP POST 请求:

 
    ? 
   
         >>>r  
         = 
         requests.post( 
         'http://httpbin.org/post' 
         , data  
         = 
         { 
         'key' 
         : 
         'value' 
         })

使用data可以传递字典作为参数，同时也可以传递元祖。

 
    ? 
   
         >>>payload  
         = 
         (( 
         'key1' 
         ,  
         'value1' 
         ), ( 
         'key1' 
         ,  
         'value2' 
         )) 
        
         >>>r  
         = 
         requests.post( 
         'http://httpbin.org/post' 
         , data 
         = 
         payload) 
        
         >>> 
         print 
         (r.text) 
        
         { 
        
         ... 
        
         "form" 
         : { 
        
         "key1" 
         : [ 
        
         "value1" 
         , 
        
         "value2" 
        
         ] 
        
         }, 
        
         ... 
        
         }

传递json是这样。

 
    ? 
   
         >>> 
         import 
         json 
        
         >>>url  
         = 
         'https://api.github.com/some/endpoint' 
        
         >>>payload  
         = 
         { 
         'some' 
         :  
         'data' 
         } 
        
         >>>r  
         = 
         requests.post(url, data 
         = 
         json.dumps(payload))

2.4.2 版的新加功能:

 
    ? 
   
         >>>url  
         = 
         'https://api.github.com/some/endpoint' 
        
         >>>payload  
         = 
         { 
         'some' 
         :  
         'data' 
         } 
        
         >>>r  
         = 
         requests.post(url, json 
         = 
         payload)

也就是说，你不需要对参数做什么变化，只需要关注使用data=还是json=，其余的requests都已经帮你做好了.

使用scrapy发送post请求。

通过源码可知scrapy默认发送的get请求，当我们需要发送携带参数的请求或登录时，是需要post、请求的，以下面为例。

 
    ? 
   
         from 
         scrapy.spider  
         import 
         CrawlSpider 
        
         from 
         scrapy.selector  
         import 
         Selector 
        
         import 
         scrapy 
        
         import 
         json 
        
         class 
         LaGou(CrawlSpider): 
        
         name  
         = 
         'myspider' 
        
         def 
         start_requests( 
         self 
         ): 
        
         yield 
         scrapy.FormRequest( 
        
         url 
         = 
         'https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false' 
         , 
        
         formdata 
         = 
         { 
        
         'first' 
         :  
         'true' 
         , 
         #这里不能给bool类型的True，requests模块中可以 
        
         'pn' 
         :  
         '1' 
         , 
         #这里不能给int类型的1，requests模块中可以 
        
         'kd' 
         :  
         'python' 
        
         },这里的formdata相当于requ模块中的data，key和value只能是键值对形式 
        
         callback 
         = 
         self 
         .parse 
        
         ) 
        
         def 
         parse( 
         self 
         , response): 
        
         datas 
         = 
         json.loads(response.body.decode())[ 
         'content' 
         ][ 
         'positionResult' 
         ][ 
         'result' 
         ] 
        
         for 
         data  
         in 
         datas: 
        
         print 
         (data[ 
         'companyFullName' 
         ]  
         + 
         str 
         (data[ 
         'positionId' 
         ]))

官方推荐的 Using FormRequest to send data via HTTP POST 。

 
    ? 
   
         return 
         [FormRequest(url 
         = 
         "http://www.example.com/post/action" 
         , 
        
         formdata 
         = 
         { 
         'name' 
         :  
         'John Doe' 
         ,  
         'age' 
         :  
         '27' 
         }, 
        
         callback 
         = 
         self 
         .after_post)]

这里使用的是FormRequest，并使用formdata传递参数，看到这里也是一个字典.

但是，超级坑的一点来了，今天折腾了一下午，使用这种方法发送请求，怎么发都会出问题，返回的数据一直都不是我想要的。

 
    ? 
   
         return 
         scrapy.FormRequest(url, formdata 
         = 
         (payload))

在网上找了很久，最终找到一种方法，使用scrapy.Request发送请求，就可以正常的获取数据.

。

复制代码代码如下:

  return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},) 

。

参考：Send Post Request in Scrapy 。

 
    ? 
   
 
     
       
       
         my_data  
         = 
         { 
         'field1' 
         :  
         'value1' 
         ,  
         'field2' 
         :  
         'value2' 
         } 
        
 
         request  
         = 
         scrapy.Request( url, method 
         = 
         'POST' 
         ,  
        
 
                       
         body 
         = 
         json.dumps(my_data),  
        
 
                       
         headers 
         = 
         { 
         'Content-Type' 
         : 
         'application/json' 
         } ) 
        
 
     
 
   

FormRequest 与 Request 区别。

在文档中，几乎看不到差别，。

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here. Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request. 。

说FormRequest新增加了一个参数formdata，接受包含表单数据的字典或者可迭代的元组，并将其转化为请求的body。并且FormRequest是继承Request的。

 
    ? 
   
         class 
         FormRequest(Request): 
        
         def 
         __init__( 
         self 
         ,  
         * 
         args,  
         * 
         * 
         kwargs): 
        
         formdata  
         = 
         kwargs.pop( 
         'formdata' 
         ,  
         None 
         ) 
        
         if 
         formdata  
         and 
         kwargs.get( 
         'method' 
         )  
         is 
         None 
         : 
        
         kwargs[ 
         'method' 
         ]  
         = 
         'POST' 
        
         super 
         (FormRequest,  
         self 
         ).__init__( 
         * 
         args,  
         * 
         * 
         kwargs) 
        
         if 
         formdata: 
        
         items  
         = 
         formdata.items()  
         if 
         isinstance 
         (formdata,  
         dict 
         )  
         else 
         formdata 
        
         querystr  
         = 
         _urlencode(items,  
         self 
         .encoding) 
        
         if 
         self 
         .method  
         = 
         = 
         'POST' 
         : 
        
         self 
         .headers.setdefault(b 
         'Content-Type' 
         , b 
         'application/x-www-form-urlencoded' 
         ) 
        
         self 
         ._set_body(querystr) 
        
         else 
         : 
        
         self 
         ._set_url( 
         self 
         .url  
         + 
         ( 
         '&' 
         if 
         '?' 
         in 
         self 
         .url  
         else 
         '?' 
         )  
         + 
         querystr) 
        
         ### 
        
         def 
         _urlencode(seq, enc): 
        
         values  
         = 
         [(to_bytes(k, enc), to_bytes(v, enc)) 
        
         for 
         k, vs  
         in 
         seq 
        
         for 
         v  
         in 
         (vs  
         if 
         is_listlike(vs)  
         else 
         [vs])] 
        
         return 
         urlencode(values, doseq 
         = 
         1 
         )

最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是POST，再来看看Request 。

 
    ? 
   
 
     
       
       
         class 
         Request(object_ref): 
        

            
        
 
            
         def 
         __init__( 
         self 
         , url, callback 
         = 
         None 
         , method 
         = 
         'GET' 
         , headers 
         = 
         None 
         , body 
         = 
         None 
         , 
        
 
                   
         cookies 
         = 
         None 
         , meta 
         = 
         None 
         , encoding 
         = 
         'utf-8' 
         , priority 
         = 
         0 
         , 
        
 
                   
         dont_filter 
         = 
         False 
         , errback 
         = 
         None 
         , flags 
         = 
         None 
         ): 
        

            
        
 
              
         self 
         ._encoding  
         = 
         encoding  
         # this one has to be set first 
        
 
              
         self 
         .method  
         = 
         str 
         (method).upper() 
        
 
     
 
   

默认的方法是GET，其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法，这是定义请求的基础方法.

 
    ? 
   
         def 
         request(method, url,  
         * 
         * 
         kwargs): 
        
         """Constructs and sends a :class:`Request <Request>`. 
        
         :param method: method for the new :class:`Request` object. 
        
         :param url: URL for the new :class:`Request` object. 
        
         :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. 
        
         :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`. 
        
         :param json: (optional) json data to send in the body of the :class:`Request`. 
        
         :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. 
        
         :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. 
        
         :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. 
        
         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` 
        
         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string 
        
         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers 
        
         to add for the file. 
        
         :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. 
        
         :param timeout: (optional) How many seconds to wait for the server to send data 
        
         before giving up, as a float, or a :ref:`(connect timeout, read 
        
         timeout) <timeouts>` tuple. 
        
         :type timeout: float or tuple 
        
         :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. 
        
         :type allow_redirects: bool 
        
         :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. 
        
         :param verify: (optional) Either a boolean, in which case it controls whether we verify 
        
         the server's TLS certificate, or a string, in which case it must be a path 
        
         to a CA bundle to use. Defaults to ``True``. 
        
         :param stream: (optional) if ``False``, the response content will be immediately downloaded. 
        
         :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. 
        
         :return: :class:`Response <Response>` object 
        
         :rtype: requests.Response 
        
         Usage:: 
        
         >>> import requests 
        
         >>> req = requests.request('GET', 'http://httpbin.org/get') 
        
         <Response [200]> 
        
         """ 
        
         # By using the 'with' statement we are sure the session is closed, thus we 
        
         # avoid leaving sockets open which can trigger a ResourceWarning in some 
        
         # cases, and look like a memory leak in others. 
        
         with sessions.Session() as session: 
        
         return 
         session.request(method 
         = 
         method, url 
         = 
         url,  
         * 
         * 
         kwargs)

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持我.

原文链接：https://zhangslob.github.io/2018/08/24/使用scrapy发送post请求的坑/ 。

最后此篇关于python使用scrapy发送post请求的坑的文章就讲到这里了,如果你想了解更多关于python使用scrapy发送post请求的坑的内容请搜索CFSDN的文章或继续浏览相关文章，希望大家以后支持我的博客！。

文章推荐： selenium+python实现自动化登录的方法

文章推荐： Java Spring开发环境搭建及简单入门示例教程

文章推荐： SpringMVC拦截器实现监听session是否过期详解

文章推荐：解决win64 Python下安装PIL出错问题(图解)

ios - 如何从 Node js 发送 voip 推送通知？我可以从 curl 发送 voip 推送，但不能从 Node 发送
我正在使用 voip 推送通知制作 ios 应用程序。我想从 Node js 发送 voip 推送通知，但不是很好。我阅读了本教程 CallKit iOS Swift Tutorial for V
C套接字编程，发送
我编写了一个服务器，当浏览器尝试连接到某些站点时，它会检查黑名单并发回 404，但是当我调用 send() 时没有错误，但消息不会出现在网络上浏览器，除非我关闭连接？有什么建议吗？接受来自浏览器的
发送 EOF 后无法读取任何内容？
#include int main() { char c = getchar(); //EOF (ctrl + d ) while( ( c = getchar() ) != '?'
powershell - 发送-替换HTML电子邮件
我正在尝试使用MailMessage对象通过PowerShell发送电子邮件。该脚本使用Import-CSV来使用文件，然后在电子邮件正文中使用ConvertTo-HTML。由于我要发送的电子邮件客户
Powershell - 发送/接收的字节数
我需要创建一个脚本，每 30 秒对网络流量进行一次采样并存储发送/接收的字节。该数据随后用于绘制图形。我编写了一个在 Windows 2012 上完美运行的程序，但我意识到某些 cmdlet 在以前的
AutoIt:发送 ("{DOWN}")不起作用
我正在运行“autoit3.chm”文件。当它运行时，我想发送一个向下键箭头，但它不起作用: $file = FileGetShortName("C:\Users\PHSD100-SIC\Deskto
c - 发送()问题
当我使用网络浏览器测试我的程序时，我可以很好地写入套接字/FD，所以我决定循环它并在连接中途切断连接，我发现了一个问题。 send() 能够在套接字不可用时关闭整个程序。我认为问题在于该程序陷入了第
AutoIt:发送 ("{DOWN}") 不工作
我正在运行“autoit3.chm”文件。当它运行时，我想发送一个向下键箭头，但它不起作用: $file = FileGetShortName("C:\Users\PHSD100-SIC\Deskto
java - 发送/接收数据出现问题
所以我试图向自己发送数据并接收数据然后打印它，现在我已经测试了一段时间，我注意到它没有发送任何东西，事实上，也许它是，但我没有正确接收它，我需要这方面的帮助。这就是我用来发送数据的
java - 发送/序列化对象的最佳实践
问题:开发人员创建自己的序列化格式有多常见？具体来说，我使用 java 本质上将对象作为一个巨大的字符串发送，并用标记来分隔变量。我的逻辑:我选择这个是因为它几乎消除了语言依赖性(忽略java的修改
ethernet - 发送/接收原始以太网帧
我必须在 Linux 上编写一个应用程序，该应用程序需要与具有自定义以太网类型的设备进行通信。甚至在如何编写这样的应用程序中也有很多解决方案。一个缺点是需要 root 访问权限(AFAIK)。之后释放
javascript - 单选按钮值在提交表单时作为 "on"发送
我有一个包含三个单选按钮选项的表单。我需要将表单数据提交到另一个文件，但由于某种原因，发送的数据包含所选单选按钮的值“on”，而不是 value 属性的值。我尝试通过 post() 函数手动操作和发
c - 如何使具有两个线程的两个进程在MPI中相互接收、发送？
基本上我想实现这样的目标: Process 1 Thread 1 Receive X from process 2 Thread 2 Receive Y from proces
java - 发送 session
我目前正在 Google App Engine 上开发一个系统，对它还很陌生，我正在使用 Java 平台进行开发。我在 servlet 之间发送 session 对象时遇到问题。我已经在 appeng
javascript - 发送 $(this) 作为参数
当我尝试将“this”(触发的元素)作为参数发送给函数时，函数收到“Object[Document build.php]”作为参数，而不是触发的元素。请让我知道我的错误: function set(a
android 响应联系人 > 发送？
我正在寻找让我的应用响应联系人 > 发送的魔法咒语。我希望能够接收联系人的 URI 以便检索联系人。谁有 list 过滤器/代码 fragment 吗？最佳答案我没有睾丸，但您可以尝试基于 ACT
c++ - 发送/接收套接字阻塞问题
关于我心爱的套接字的另一个问题。我先解释一下我的情况。之后我会告诉你是什么困扰着我。我有一个客户端和一个服务器。这两个应用程序都是用 C++ 编写的，实现了 winsock2。连接通过 TCP 和
C 发送/返回带有函数的数组
我看到了这篇文章 http://www.eskimo.com/~scs/cclass/int/sx5.html 但这部分让我感到困惑:如果我们已经使用 send_array 或 send_array_
c - 发送:无效参数
我对这行代码有疑问。我必须将一个数据包带到一个端口并重新发送到接口(interface)(例如:eth0)。我的程序成功地从端口获取数据包，但是当我重新发送(使用 send())到接口(interfa
发送 X11 鼠标事件的正确方法
我正在尝试编写一个 X11 输入驱动程序，它可以使用我的 Android 手机上的触摸屏来移动和单击鼠标。我可以正常移动鼠标，但我无法让应用程序正确识别点击。我当前的代码位于 https://gist

qq735679552

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python使用scrapy发送post请求的坑