gpt4 book ai didi

python使用scrapy发送post请求的坑

转载 作者:qq735679552 更新时间:2022-09-27 22:32:09 27 4
gpt4 key购买 nike

CFSDN坚持开源创造价值,我们致力于搭建一个资源共享平台,让每一个IT人在这里找到属于你的精彩世界.

这篇CFSDN的博客文章python使用scrapy发送post请求的坑由作者收集整理,如果你对这篇文章有兴趣,记得点赞哟.

使用requests发送post请求 。

先来看看使用requests来发送post请求是多少好用,发送请求 。

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如,你可以这样发送一个 HTTP POST 请求:

?
1
>>>r = requests.post( 'http://httpbin.org/post' , data = { 'key' : 'value' })

使用data可以传递字典作为参数,同时也可以传递元祖 。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
>>>payload = (( 'key1' , 'value1' ), ( 'key1' , 'value2' ))
>>>r = requests.post( 'http://httpbin.org/post' , data = payload)
>>> print (r.text)
{
  ...
  "form" : {
   "key1" : [
    "value1" ,
    "value2"
   ]
  },
  ...
}

传递json是这样 。

?
1
2
3
4
5
6
>>> import json
 
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = { 'some' : 'data' }
 
>>>r = requests.post(url, data = json.dumps(payload))

2.4.2 版的新加功能:

?
1
2
3
4
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = { 'some' : 'data' }
 
>>>r = requests.post(url, json = payload)

也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了.

使用scrapy发送post请求 。

通过源码可知scrapy默认发送的get请求,当我们需要发送携带参数的请求或登录时,是需要post、请求的,以下面为例 。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
   name = 'myspider'
   def start_requests( self ):
     yield scrapy.FormRequest(
       url = 'https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false' ,
       formdata = {
         'first' : 'true' , #这里不能给bool类型的True,requests模块中可以
         'pn' : '1' , #这里不能给int类型的1,requests模块中可以
         'kd' : 'python'
       },这里的formdata相当于requ模块中的data,key和value只能是键值对形式
       callback = self .parse
     )
   def parse( self , response):
     datas = json.loads(response.body.decode())[ 'content' ][ 'positionResult' ][ 'result' ]
     for data in datas:
       print (data[ 'companyFullName' ] + str (data[ 'positionId' ]))

官方推荐的 Using FormRequest to send data via HTTP POST 。

?
1
2
3
return [FormRequest(url = "http://www.example.com/post/action" ,
           formdata = { 'name' : 'John Doe' , 'age' : '27' },
           callback = self .after_post)]

这里使用的是FormRequest,并使用formdata传递参数,看到这里也是一个字典.

但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的 。

?
1
return scrapy.FormRequest(url, formdata = (payload))

在网上找了很久,最终找到一种方法,使用scrapy.Request发送请求,就可以正常的获取数据.

  。

复制代码 代码如下:
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)

  。

参考:Send Post Request in Scrapy 。

?
1
2
3
4
my_data = { 'field1' : 'value1' , 'field2' : 'value2' }
request = scrapy.Request( url, method = 'POST' ,
              body = json.dumps(my_data),
              headers = { 'Content-Type' : 'application/json' } )

FormRequest 与 Request 区别 。

在文档中,几乎看不到差别, 。

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here. Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request. 。

说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的 。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class FormRequest(Request):
 
   def __init__( self , * args, * * kwargs):
     formdata = kwargs.pop( 'formdata' , None )
     if formdata and kwargs.get( 'method' ) is None :
       kwargs[ 'method' ] = 'POST'
 
     super (FormRequest, self ).__init__( * args, * * kwargs)
 
     if formdata:
       items = formdata.items() if isinstance (formdata, dict ) else formdata
       querystr = _urlencode(items, self .encoding)
       if self .method = = 'POST' :
         self .headers.setdefault(b 'Content-Type' , b 'application/x-www-form-urlencoded' )
         self ._set_body(querystr)
       else :
         self ._set_url( self .url + ( '&' if '?' in self .url else '?' ) + querystr)
       ###
 
 
def _urlencode(seq, enc):
   values = [(to_bytes(k, enc), to_bytes(v, enc))
        for k, vs in seq
        for v in (vs if is_listlike(vs) else [vs])]
   return urlencode(values, doseq = 1 )

最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是POST,再来看看Request 。

?
1
2
3
4
5
6
7
8
class Request(object_ref):
 
   def __init__( self , url, callback = None , method = 'GET' , headers = None , body = None ,
          cookies = None , meta = None , encoding = 'utf-8' , priority = 0 ,
          dont_filter = False , errback = None , flags = None ):
 
     self ._encoding = encoding # this one has to be set first
     self .method = str (method).upper()

默认的方法是GET,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def request(method, url, * * kwargs):
   """Constructs and sends a :class:`Request <Request>`.
 
   :param method: method for the new :class:`Request` object.
   :param url: URL for the new :class:`Request` object.
   :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
   :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
   :param json: (optional) json data to send in the body of the :class:`Request`.
   :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
   :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
   :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
     ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
     or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
     defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
     to add for the file.
   :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
   :param timeout: (optional) How many seconds to wait for the server to send data
     before giving up, as a float, or a :ref:`(connect timeout, read
     timeout) <timeouts>` tuple.
   :type timeout: float or tuple
   :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
   :type allow_redirects: bool
   :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
   :param verify: (optional) Either a boolean, in which case it controls whether we verify
       the server's TLS certificate, or a string, in which case it must be a path
       to a CA bundle to use. Defaults to ``True``.
   :param stream: (optional) if ``False``, the response content will be immediately downloaded.
   :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
   :return: :class:`Response <Response>` object
   :rtype: requests.Response
 
   Usage::
 
    >>> import requests
    >>> req = requests.request('GET', 'http://httpbin.org/get')
    <Response [200]>
   """
 
   # By using the 'with' statement we are sure the session is closed, thus we
   # avoid leaving sockets open which can trigger a ResourceWarning in some
   # cases, and look like a memory leak in others.
   with sessions.Session() as session:
     return session.request(method = method, url = url, * * kwargs)

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持我.

原文链接:https://zhangslob.github.io/2018/08/24/使用scrapy发送post请求的坑/ 。

最后此篇关于python使用scrapy发送post请求的坑的文章就讲到这里了,如果你想了解更多关于python使用scrapy发送post请求的坑的内容请搜索CFSDN的文章或继续浏览相关文章,希望大家以后支持我的博客! 。

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com