gpt4 book ai didi

python - 需要下载PDF,而不是网页内容

转载 作者:行者123 更新时间:2023-12-01 02:23:52 26 4
gpt4 key购买 nike

就目前情况而言,我能够获取 PDF 链接 EXAMPLE OF THE LINK HERE 的网页内容但是,我不想要网页的内容,我想要 PDF 的内容,这样我就可以将内容放入计算机上的 PDF 文件夹中。

我已经在不需要登录并且没有代理服务器的网站上成功地做到了这一点。

相关代码:

import os
import urllib2
import time
import requests
import urllib3
from random import *


s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"

print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)

#I get a response 200 from printing r2
print r2


downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"

file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)

#This prints out the response from the proxy server (i.e. 200)
print temp

something = uniform(5,6)
print something
time.sleep(something)

#This gets me the content of the web page, not the content of the PDF
print temp.content

file.write(temp.content)
file.close()

我需要帮助了解如何“下载”PDF 内容

最佳答案

试试这个:

import requests

url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'

pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)

编辑

再次尝试使用请求 session 来管理cookie(假设他们在登录后向您发送这些cookie),也可能使用不同的代理

proxy_dict = {'https': 'ip:port'}

with requests.Session() as session:
# Authentication request, use GET/POST whatever is needed
# data variable should hold user/password information
auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False)
if auth.status_code == 200:
print(auth.cookies) # Tell me if you got anything
pdf = auth.get('download_url') # Were continuing the same session
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
else:
print('No go, got {0} response'.format(auth.status_code))

关于python - 需要下载PDF,而不是网页内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47642578/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com