gpt4 book ai didi

python - 用 mechanize 下载文件

转载 作者:太空狗 更新时间:2023-10-30 03:06:39 24 4
gpt4 key购买 nike

我有一个已打开页面的浏览器实例。我想下载并保存所有链接(它们是 PDF)。有人知道怎么做吗?

谢谢

最佳答案

import urllib, urllib2,cookielib, re
#http://www.crummy.com/software/BeautifulSoup/ - required
from BeautifulSoup import BeautifulSoup

HOST = 'https://www.adobe.com/'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

req = opener.open( HOST + 'pdf' )
responce = req.read()

soup = BeautifulSoup( responce )
pdfs = soup.findAll(name = 'a', attrs = { 'href': re.compile('\.pdf') })
for pdf in pdfs:
if 'https://' not in pdf['href']:
url = HOST + pdf['href']
else:
url = pdf['href']
try:
#http://docs.python.org/library/urllib.html#urllib.urlretrieve
urllib.urlretrieve(url)
except Exception, e:
print 'cannot obtain url %s' % ( url, )
print 'from href %s' % ( pdf['href'], )
print e
else:
print 'downloaded file'
print url

关于python - 用 mechanize 下载文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7640762/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com