gpt4 book ai didi

python - 无法将下载的文件存储在相关文件夹中

转载 作者:太空狗 更新时间:2023-10-29 20:26:55 25 4
gpt4 key购买 nike

我用 python 结合 selenium 编写了一个脚本,用于从网页下载几个文档文件(以 .doc 结尾)。我不想使用 requestsurllib 模块下载文件的原因是因为我目前正在使用的网站没有任何真实的 url 连接到每个文件.它们是 javascript 加密的。但是,我在我的脚本中选择了一个链接来模仿它。

此时我的脚本做了什么:

  1. 在桌面创建一个主文件夹
  2. 使用要下载的文件的名称在主文件夹中创建子文件夹
  3. 下载文件开始点击他们的链接并将文件放在主文件夹中。 (这是我需要纠正的)

How can I modify my script to download the files initiating click on their links and put the downloaded files in their concerning folders?

这是我目前的尝试:

import os
import time
from selenium import webdriver

link ='https://www.online-convert.com/file-format/doc'

dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):os.mkdir(desk_location)

def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()

time_to_wait = 10
time_counter = 0
try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
except Exception:pass

if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()

下图表示下载文件当前的存储方式(文件在其相关文件夹之外):

enter image description here

最佳答案

我只是添加了文件的重命名来移动它。所以它会像你拥有的那样工作,但是一旦它下载了文件,就会将它移动到正确的路径:

os.rename(desk_location + '\\' + filename, file_location)

完整代码:

import os
import time
from selenium import webdriver

link ='https://www.online-convert.com/file-format/doc'

dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):
os.mkdir(desk_location)

def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):
os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()

time_to_wait = 10
time_counter = 0

try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
os.rename(desk_location + '\\' + filename, file_location)
except Exception:pass

if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()

关于python - 无法将下载的文件存储在相关文件夹中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54626470/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com