gpt4 book ai didi

python-3.x - 如何在 Python 3 中使用 Selenium 模块 (FireFox) 下载 PDF

转载 作者:行者123 更新时间:2023-12-02 07:21:08 25 4
gpt4 key购买 nike

我想下载来自在线杂志的 PDF。要打开它,必须先登录。然后打开 PDF 并下载。

以下是我的代码。可以登录页面,也可以打开PDF。但是无法下载 PDF,因为我不确定如何模拟点击保存。我使用 FireFox。

import os, time
from selenium import webdriver
from bs4 import BeautifulSoup

# Use firefox dowmloader to get file
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", 'D:/eBooks/Stocks_andCommodities/2008/Jul/')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
fp.set_preference("pdfjs.disabled", "true")

# disable Adobe Acrobat PDF preview plugin
fp.set_preference("plugin.scan.plid.all", "false")
fp.set_preference("plugin.scan.Acrobat", "99.0")

browser = webdriver.Firefox(firefox_profile=fp)

# Get the login web page
web_url = 'http://technical.traders.com/sub/sublogin2.asp'
browser.get(web_url)

# SImulate the authentication
user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]')
user_name.send_keys("thomas2003@test.net")
password = browser.find_element_by_css_selector('#SubName > input[type="text"]')
password.send_keys("LastName")
time.sleep(2)
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]')
submit.click()
time.sleep(2)

# Open the PDF for downloading
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
browser.get(url)
time.sleep(10)

# How to simulate the Clicking to Save/Download the PDF here?

最佳答案

您不应在浏览器中打开该文件。一旦你有了文件 url。获取包含所有 cookie 的请求 session

def get_request_session(driver):
import requests
session = requests.Session()
for cookie in driver.get_cookies():
session.cookies.set(cookie['name'], cookie['value'])

return session

一旦你有了 session ,你就可以使用相同的方式下载文件

url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
session = get_request_session(driver)
r = session.get(url, stream=True)
chunk_size = 2000
with open('/tmp/mypdf.pdf', 'wb') as file:
for chunk in r.iter_content(chunk_size):
file.write(chunk)

关于python-3.x - 如何在 Python 3 中使用 Selenium 模块 (FireFox) 下载 PDF,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46026983/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com