python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求？-6ren

python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求？

转载作者：太空狗更新时间：2023-10-29 20:31:46

这是由高级代理提供商 luminati.io 提供的 API。但是，它以字节码而不是字典的形式返回，因此将其转换为字典以便能够提取 ip 和 port:

每个请求都将以一个新的对等代理结束，因为 IP 会为每个请求轮换。

import csv
import requests
import json
import time

#!/usr/bin/env python

print('If you get error "ImportError: No module named \'six\'"'+\
    'install six:\n$ sudo pip install six');
import sys
if sys.version_info[0]==2:
    import six
    from six.moves.urllib import request
    opener = request.build_opener(
        request.ProxyHandler(
            {'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
    proxy_details = opener.open('http://lumtest.com/myip.json').read()
if sys.version_info[0]==3:
    import urllib.request
    opener = urllib.request.build_opener(
        urllib.request.ProxyHandler(
            {'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
    proxy_details = opener.open('http://lumtest.com/myip.json').read()
proxy_dictionary = json.loads(proxy_details)

print(proxy_dictionary)

然后我打算使用requests模块中的ip和port连接到感兴趣的网站:

headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}

if __name__ == "__main__":

    search_keyword = input("Enter the search keyword: ")
    page_number =  int(input("Enter total number of pages: "))

    for i in range(1,page_number+1):
        time.sleep(10)

        link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
        proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
        print(proxy)
        req = requests.get(link,headers=headers,proxies={"https":proxy})

但我的问题是它在 requests 部分出错。当我将 proxies={"https":proxy} 更改为 proxies={"http":proxy} 时，有一次它通过了，但除此之外，代理连接失败。

示例输出:

print_dictionary = {'ip': '84.22.151.191', 'country': 'RU', 'asn': {'asnum': 57129, 'org_name': 'Optibit LLC'}, 'geo': {'city': 'Krasnoyarsk', 'region': 'KYA', 'postal_code': '660000', 'latitude': 56.0097, 'longitude': 92.7917, 'tz': 'Asia/Krasnoyarsk'}}

下图显示了对等代理的详细信息:

print(proxy) 将产生 84.22.151.191:57129，它被输入到 requests.get 方法

我得到的错误:

(Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000282DDD592B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))

我测试了删除 requests 方法的 proxies={"https":proxy} 参数，并且抓取工作没有错误。所以代理有问题或我访问它的方式。

最佳答案

将 proxies={"https":proxy} 更改为 proxies={"http":proxy} 时，您还必须确保您的链接是 http 而不是 https 所以也尝试替换:

link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

与

link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

您的整体代码应如下所示:

headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}

if __name__ == "__main__":

    search_keyword = input("Enter the search keyword: ")
    page_number =  int(input("Enter total number of pages: "))

    for i in range(1,page_number+1):
        time.sleep(10)

        link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
        proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
        print(proxy)
        req = requests.get(link,headers=headers,proxies={"http":proxy})

希望这对您有所帮助!

关于python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54156628/

文章推荐： Python 3.7 无法连接到 HTTPS URL，因为 SSL 模块不可用

文章推荐： c++ - 需要哪个版本的 Visual C++ 运行时库？

文章推荐： c++ - 在 iPhone 上使用开罗？

文章推荐： C++/Qt 多行字符串；还有，多个查询

phantomjs - 如何在 Phantomjs 中使用 luminati.io
我一直在使用代理 ips 和 phantomjs 来抓取数据。有人将 luminati.io 与 phantomjs 一起使用了吗？因为 luminati 使用最终用户计算机 ips 来阅读页面。它的
python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求？
这是由高级代理提供商 luminati.io 提供的 API。但是，它以字节码而不是字典的形式返回，因此将其转换为字典以便能够提取 ip 和 port: 每个请求都将以一个新的对等代理结束，因为 IP
python - 无法在 Python 3 中的 Selenium 上设置 Luminati 代理
我正在 Python 上使用 Selenium 设置 Firefox 驱动程序的代理。我按照以下说明设置代理:https://github.com/luminati-io/api/blob/mast
python - 如何在 python 中为 chrome 的 Selenium Webdriver 设置 luminati 代理？
我想在 webdriver.Chrome 中为 selenium python 设置 luminati 代理。我尝试使用以下命令: from selenium import webdriver fro

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求？