gpt4 book ai didi

python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求?

转载 作者:太空狗 更新时间:2023-10-29 20:31:46 30 4
gpt4 key购买 nike

这是由高级代理提供商 luminati.io 提供的 API。但是,它以字节码而不是字典的形式返回,因此将其转换为字典以便能够提取 ipport:

每个请求都将以一个新的对等代理结束,因为 IP 会为每个请求轮换。

import csv
import requests
import json
import time

#!/usr/bin/env python

print('If you get error "ImportError: No module named \'six\'"'+\
'install six:\n$ sudo pip install six');
import sys
if sys.version_info[0]==2:
import six
from six.moves.urllib import request
opener = request.build_opener(
request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
if sys.version_info[0]==3:
import urllib.request
opener = urllib.request.build_opener(
urllib.request.ProxyHandler(
{'http': 'http://lum-customer-hl_1247574f-zone-static:lnheclanmc@127.0.3.1:20005'}))
proxy_details = opener.open('http://lumtest.com/myip.json').read()
proxy_dictionary = json.loads(proxy_details)

print(proxy_dictionary)

然后我打算使用requests模块中的ipport连接到感兴趣的网站:

headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}

if __name__ == "__main__":

search_keyword = input("Enter the search keyword: ")
page_number = int(input("Enter total number of pages: "))

for i in range(1,page_number+1):
time.sleep(10)

link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
print(proxy)
req = requests.get(link,headers=headers,proxies={"https":proxy})

但我的问题是它在 requests 部分出错。当我将 proxies={"https":proxy} 更改为 proxies={"http":proxy} 时,有一次它通过了,但除此之外,代理连接失败。

示例输出:

print_dictionary = {'ip': '84.22.151.191', 'country': 'RU', 'asn': {'asnum': 57129, 'org_name': 'Optibit LLC'}, 'geo': {'city': 'Krasnoyarsk', 'region': 'KYA', 'postal_code': '660000', 'latitude': 56.0097, 'longitude': 92.7917, 'tz': 'Asia/Krasnoyarsk'}}

下图显示了对等代理的详细信息: Peer proxy

print(proxy) 将产生 84.22.151.191:57129,它被输入到 requests.get 方法

我得到的错误:

(Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x00000282DDD592B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))

我测试了删除 requests 方法的 proxies={"https":proxy} 参数,并且抓取工作没有错误。所以代理有问题或我访问它的方式。

最佳答案

proxies={"https":proxy} 更改为 proxies={"http":proxy} 时,您还必须确保您的链接是 http 而不是 https 所以也尝试替换:

link = 'https://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'

您的整体代码应如下所示:

headers = {'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}

if __name__ == "__main__":

search_keyword = input("Enter the search keyword: ")
page_number = int(input("Enter total number of pages: "))

for i in range(1,page_number+1):
time.sleep(10)

link = 'http://www.experiment.com.ph/catalog/?_keyori=ss&ajax=true&from=input&page='+str(i)+'&q='+str(search_keyword)+'&spm=a2o4l.home.search.go.239e6ef06RRqVD'
proxy = proxy_dictionary["ip"] + ':' + str(proxy_dictionary["asn"]["asnum"])
print(proxy)
req = requests.get(link,headers=headers,proxies={"http":proxy})

希望这对您有所帮助!

关于python - 如何使用代理服务器(如 luminati.io)正确地向 https 发出请求?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54156628/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com