gpt4 book ai didi

python - 403 禁止使用 Urllib2 [Python]

转载 作者:太空宇宙 更新时间:2023-11-03 13:14:16 25 4
gpt4 key购买 nike

url = 'https://www.instagram.com/accounts/login/ajax/'
values = {'username' : 'User',
'password' : 'Pass'}

#'User-agent', ''
data = urllib.urlencode(values)
req = urllib2.Request(url, data,headers={'User-Agent' : "Mozilla/5.0"})
con = urllib2.urlopen( req )
the_page = response.read()

有没有人对此有任何想法?我不断收到错误“403 forbidden”。它可能的 instagram 有一些东西不允许我通过 python 连接(我不想通过他们的 API 连接)。这到底是怎么回事,有人有什么想法吗?

谢谢!

编辑:添加更多信息。

我得到的错误是这个

This page could not be loaded. If you have cookies disabled in your browser, or you are browsing in Private Mode, please try enabling cookies or turning off Private Mode, and then retrying your action.

我编辑了我的代码,但仍然出现该错误。

jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
print len(jar) #prints 0
opener.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36')]
result = opener.open('https://www.instagram.com')
print result.getcode(), len(jar) #prints 200 and 2

url = 'https://www.instagram.com/accounts/login/ajax/'
values = {'username' : 'username',
'password' : 'password'}

data = urllib.urlencode(values)

response = opener.open(url, data)
print response.getcode()

最佳答案

对于初学者来说,有两件重要的事情:

  • 确保您站在法律一边。根据 Instagram 的 Terms of Use :

We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).

You must not create accounts with the Service through unauthorized means, including but not limited to, by using an automated device, script, bot, spider, crawler or scraper.

除此之外,Instagram 本身是大量 javascript,您可能会发现仅使用 urllib2requests 很难处理。如果由于某种原因您无法使用 API,您可以通过 selenium 查看浏览器自动化。 .请注意,您可以自动化 headless 浏览器,如 PhantomJS还。这是登录的示例代码:

from selenium import webdriver

USERNAME = "username"
PASSWORD = "password"

driver = webdriver.PhantomJS()
driver.get("https://www.instagram.com")

driver.find_element_by_name("username").send_keys(USERNAME)
driver.find_element_by_name("password").send_keys(PASSWORD)

driver.find_element_by_xpath("//button[. = 'Log in']").click()

关于python - 403 禁止使用 Urllib2 [Python],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34974117/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com