gpt4 book ai didi

Python dowload all links in a dataframe column(Python下载数据框列中的所有链接)

转载 作者:bug小助手 更新时间:2023-10-24 23:40:44 29 4
gpt4 key购买 nike



Here is an example of my dataframe

以下是我的数据帧示例


id pdf
1 https://ia802902.us.archive.org/10/items/EL103_L_1978_03_024_01_1_PF_03/EL103_L_1978_03_024_01_1_PF_03.pdf
2 https://ia801900.us.archive.org/31/items/EL103_L_1978_03_033_07_1_PF_05/EL103_L_1978_03_033_07_1_PF_05.pdf
3 https://ia802900.us.archive.org/35/items/EL105_L_1978_03_072_03_1_PF_05/EL105_L_1978_03_072_03_1_PF_05.pdf

I want to download each pdf that is in column ['pdf']. I tried the following code (source: https://www.geeksforgeeks.org/downloading-pdfs-with-python-using-requests-and-beautifulsoup/)

我想下载[‘pdf’]栏中的每个pdf。我尝试了以下代码(来源:https://www.geeksforgeeks.org/downloading-pdfs-with-python-using-requests-and-beautifulsoup/)


import requests
from bs4 import BeautifulSoup

for url in df["pdf"]:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a')
i = 0
for link in links:
if ('.pdf' in link.get('href', [])):
i += 1
print("Downloading file: ", i)

response = requests.get(link.get('href'))
pdf = open("C:/myfolder"+str(i)+".pdf", 'wb')
pdf.write(response.content)
pdf.close()
print("File ", i, " downloaded")

It starts running but it does not download any file. I would like to keep the original name of the pdf (for example: EL103_L_1978_03_024_01_1_PF_03.pdf). Any suggestion?

它开始运行,但不下载任何文件。我想保留pdf的原始名称(例如:EL103_L_1978_03_024_01_1_PF_03.pdf)。有什么建议吗?


更多回答
优秀答案推荐

You can use this example how to download the PDFs:

您可以使用此示例如何下载PDF:


import requests

for pdf_url in df["pdf"]:
file_name = pdf_url.split("/")[-1]
with open(file_name, "wb") as f_out:
print("Downloading", pdf_url)
f_out.write(requests.get(pdf_url).content)

Prints:

打印:


Downloading https://ia802902.us.archive.org/10/items/EL103_L_1978_03_024_01_1_PF_03/EL103_L_1978_03_024_01_1_PF_03.pdf
Downloading https://ia801900.us.archive.org/31/items/EL103_L_1978_03_033_07_1_PF_05/EL103_L_1978_03_033_07_1_PF_05.pdf
Downloading https://ia802900.us.archive.org/35/items/EL105_L_1978_03_072_03_1_PF_05/EL105_L_1978_03_072_03_1_PF_05.pdf

and saves them as:

并将它们保存为:


andrej@MyPC:~/app$ ls -alF *pdf
-rw-r--r-- 1 root root 792942 sep 10 22:54 EL103_L_1978_03_024_01_1_PF_03.pdf
-rw-r--r-- 1 root root 559170 sep 10 22:54 EL103_L_1978_03_033_07_1_PF_05.pdf
-rw-r--r-- 1 root root 935443 sep 10 22:54 EL105_L_1978_03_072_03_1_PF_05.pdf

更多回答

Thank you. I just did a small change with open("direction of folder"+file-name, "wb") as f_out

谢谢我只是做了一个小的变化与开放(“方向的文件夹”+文件名,“文件夹”)作为f_out

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com