gpt4 book ai didi

python - 如何获取所有图像链接并使用python下载

转载 作者:太空宇宙 更新时间:2023-11-04 01:14:21 25 4
gpt4 key购买 nike

这是我的代码

from bs4 import BeautifulSoup
import urllib.request
import re

print("Enter the link \n")
link = input()

url = urllib.request.urlopen(link)
content = url.read()
soup = BeautifulSoup(content)
links = [a['href'] for a in soup.find_all('a',href=re.compile('http.*\.jpg'))]
print (len(links))
#print (links)
print("\n".join(links))

当我将输入作为

http://keralapals.com/emmanuel-malayalam-movie-stills

我得到了输出

http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-0.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-1.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-2.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-3.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-4.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-5.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-6.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-7.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-8.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-9.jpg
http://keralapals.com/wp-content/uploads/2013/01/Emmanuel-malayalam-movie-mammootty-photos-pics-wallpapers-10.jpg

但是,当我给出输入时

http://www.raagalahari.com/actress/13192/regina-cassandra-at-big-green-ganesha-2014.aspx
or
http://www.ragalahari.com/actress/13192/regina-cassandra-at-big-green-ganesha-2014.aspx

它不产生任何输出:(

所以,我需要获取其原始图片的链接。此页面仅包含缩略图。当我们点击那些缩略图时,我们得到了原始图像链接。我需要获取这些图像链接并需要下载 :(非常欢迎任何帮助.. :)

谢谢

穆尼布 K

最佳答案

问题在于,在第二种情况下,以 .jpg 结尾的实际图像 url 位于 img 标签的 src 属性内:

<a href="/actress/13192/regina-cassandra-at-big-green-ganesha-2014/image61.aspx">
<img src="http://imgcdn.raagalahari.com/aug2014/starzone/regina-big-green-ganesha/regina-big-green-ganesha61t.jpg" alt="Regina Cassandra" title="Regina Cassandra at BIG Green Ganesha 2014">
</a>

作为一种选择,您也可以支持这种类型的链接:

links = [a['href'] for a in soup.find_all('a', href=re.compile('http.*\.jpg'))]

imgs = [img['src'] for img in soup.find_all('img', src=lambda x: x.endswith('.jpg'))]
links += imgs

print (len(links))
print("\n".join(links))

对于 this url它打印:

http://imgcdn.raagalahari.com/aug2014/starzone/regina-big-green-ganesha/regina-big-green-ganesha61t.jpg
http://imgcdn.raagalahari.com/aug2014/starzone/regina-big-green-ganesha/regina-big-green-ganesha105t.jpg
http://imgcdn.raagalahari.com/aug2014/starzone/regina-big-green-ganesha/regina-big-green-ganesha106t.jpg
http://imgcdn.raagalahari.com/aug2014/starzone/regina-big-green-ganesha/regina-big-green-ganesha107t.jpg
...

请注意,我传递的不是正则表达式 a function我检查 src 属性是否以 .jpg 结尾。

希望对您有所帮助,您今天学到了一些新东西。

关于python - 如何获取所有图像链接并使用python下载,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25591025/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com