gpt4 book ai didi

python-3.x - Python3 : urllib. error.HTTPError: HTTP 错误 403: 禁止访问

转载 作者:行者123 更新时间:2023-12-04 01:07:18 24 4
gpt4 key购买 nike

请帮帮我!

我正在使用 Python3.3 和这段代码:

import urllib.request
import sys
Open_Page = urllib.request.urlopen(
"http://wowcircle.com"
).read().decode().encode('utf-8')

我接受这个:

    Traceback (most recent call last):
File "C:\Users\1\Desktop\WCLauncer\reg.py", line 5, in <module>
"http://forum.wowcircle.com"
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 507, in error
result = self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 507, in error
result = self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 507, in error
result = self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

我明白,我无法访问网站 wowcircle.com。但我只想拿源代码!我相信我可以做到,不需要访问权限,但是怎么做呢?

最佳答案

我建议您相应地设置 header 。查看您的浏览器发送的内容(HTTP header 插件)。

一个函数可能是这样的:

def openAsOpera(url):
u = urllib.URLopener() # Python 3: urllib.request.URLOpener
u.addheaders = []
u.addheader('User-Agent', 'Opera/9.80 (Windows NT 6.1; WOW64; U; de) Presto/2.10.289 Version/12.01')
u.addheader('Accept-Language', 'de-DE,de;q=0.9,en;q=0.8')
u.addheader('Accept', 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1')
f = u.open(url)
content = f.read()
f.close()
return content

这会让您在某些网页上遇到一些错误,这些网页比基本版本对客户端的期望更高。

现在我得到这个错误:

Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
s = openAsOpera('http://wowcircle.com/')
File "C:....pyw", line 522, in openAsOpera
f = u.open(url)
File "C:\Python27\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 359, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "C:\Python27\lib\urllib.py", line 376, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "C:\Python27\lib\urllib.py", line 381, in http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 302, 'Moved Temporarily', <httplib.HTTPMessage instance at 0x02C8F1C0>)

这意味着您现在可以访问,因为您伪造了真实浏览器的请求。

>>> try: s = openAsOpera('http://wowcircle.com/?pmtry=1')
except: import sys; ty, err, tb = sys.exc_info()

>>> err.args[3].headers
['Server: nginx\r\n', 'Date: Sat, 05 Apr 2014 07:42:00 GMT\r\n', 'Content-Type: text/html\r\n', 'Content-Length: 154\r\n', 'Connection: close\r\n', 'Set-Cookie: PMBC=9979187990a58a5bfdaa6d1380ad6156; path=/\r\n', 'Location: http://wowcircle.com/?pmtry=1\r\n']

需要注意的地方:重定向转到此位置:http://wowcircle.com/?pmtry=1 然后到 whis:http://wowcircle.com/?pmtry=2。它算起来了。并且似乎在等待 cookie。

所以我的分析结果是:不要忘记在每次访问该站点时发送 cookie

关于python-3.x - Python3 : urllib. error.HTTPError: HTTP 错误 403: 禁止访问,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22877619/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com