gpt4 book ai didi

Python3 - urllib.error.HTTPError : HTTP Error 403: Forbidden

转载 作者:太空宇宙 更新时间:2023-11-04 05:55:32 24 4
gpt4 key购买 nike

我正在尝试为我的域列表获取 Google PageRank,但我最终遇到了这个错误:

Python3: raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

针对我的问题,我已经尝试了一些现有的解决方案,但没有一个能解决我的问题。这是我的代码:

#  Script for getting Google Page Rank of page
# Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
#
# original from http://pagerank.gamesaga.net/
# this version was adapted from http://www.djangosnippets.org/snippets/221/
# by Corey Goldberg - 2010
#
# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php


from __future__ import print_function, division
import sys
import urllib.request as _urlib1 # py3
import urllib.parse as _urlib2 # py 3




def get_pagerank(url):
hsh = check_hash(hash_url(url))
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
gurl = 'http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=%s&q=info:%s' % (hsh, _urlib2.quote(url))
headers={'User-Agent':user_agent,}
request=_urlib1.Request(gurl,None,headers) #The assembled request
u = _urlib1.urlopen(request)
s = u.read().decode('utf-8') # for py2, comment .decode() part
#print(s) # debug - response of server
rank = s.strip()[9:]
if rank == '':
rank = 'None'
if rank == 'None':
rank = 'None'
return rank


def int_str(string, integer, factor):
for i in range(len(string)) :
integer *= factor
integer &= 0xFFFFFFFF
integer += ord(string[i])
return integer


def hash_url(string):
c1 = int_str(string, 0x1505, 0x21)
c2 = int_str(string, 0, 0x1003F)

c1 >>= 2
c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)

t1 = (c1 & 0x3C0) << 4
t1 |= c1 & 0x3C
t1 = (t1 << 2) | (c2 & 0xF0F)

t2 = (c1 & 0xFFFFC000) << 4
t2 |= c1 & 0x3C00
t2 = (t2 << 0xA) | (c2 & 0xF0F0000)

return (t1 | t2)


def check_hash(hash_int):
hash_str = '%u' % (hash_int)
flag = 0
check_byte = 0

i = len(hash_str) - 1
while i >= 0:
byte = int(hash_str[i])
if 1 == (flag % 2):
byte *= 2;
byte = int(byte / 10) + byte % 10
check_byte += byte
flag += 1
i -= 1

check_byte %= 10
if 0 != check_byte:
check_byte = 10 - check_byte
if 1 == flag % 2:
if 1 == check_byte % 2:
check_byte += 9
check_byte >>= 1

return '7' + str(check_byte) + hash_str

有人可以帮忙吗?

最佳答案

问题不在于 IP 地址被阻止。我正在使用 Python3 并遇到同样的问题。我发现 Google 阻止了不会覆盖 User-Agent 和 Accept-Encoding header 的 urllib。

它用于测试搜索的 header :

GET /search?q=f1+2015 HTTP/1.1
Accept-Encoding: identity
Connection: close
User-Agent: Python-urllib/3.4
Host: 127.0.0.1:8076

我将“Accept-Encoding”设置为“”,将“User-Agent”设置为“testing”,403 错误停止。

关于Python3 - urllib.error.HTTPError : HTTP Error 403: Forbidden,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28033783/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com