gpt4 book ai didi

python - 使用 BeautifulSoup 提取图像链接

转载 作者:行者123 更新时间:2023-12-01 00:37:06 25 4
gpt4 key购买 nike

我正在尝试从 GoT wiki 页面提取图像链接前两个链接可以找到,但后两个链接给我一个 404 错误代码。我正在努力找出我做错了什么。

我尝试了不同的组合来找到正确的链接。

import requests
from bs4 import BeautifulSoup
import urllib
import urllib.request as request
import re
url = 'https://en.wikipedia.org/w/index.php' + \
'?title=List_of_Game_of_Thrones_episodes&oldid=802553687'
r = requests.get(url)
html_contents = r.text
soup = BeautifulSoup(html_contents, 'html.parser')
# Find all a tags in the soup 
for a in soup.find_all('a'):
# While looping through the text if you find img in 'a' tag
# Then print the src attribute
if a.img:
print('http:/'+a.img['src'])
# And here are the images on the page

http:///upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png

http:///upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Game_of_Thrones_2011_logo.svg/300px-Game_of_Thrones_2011_logo.svg.png

http://static/images/wikimedia-button.png

http://static/images/poweredby_mediawiki_88x31.png

前两个链接有效

但我想让后两个链接也能正常工作。

最佳答案

感谢您的帮助。我保持简单。这是对我有用的:

# Find all a tags in the soup 
for a in soup.find_all('a'):
# While looping through the text if you find img in 'a' tag
# Then print the src attribute
if a.img:
if a.img['src'][:2] == '//':
print('https:'+a.img['src'])
else:
print('https://en.wikipedia.org/'+a.img['src'])
# And here are the images on the page

关于python - 使用 BeautifulSoup 提取图像链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57651213/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com