gpt4 book ai didi

python - BeautifulSoup 当img src有../..时如何获取url?

转载 作者:行者123 更新时间:2023-12-01 05:53:37 25 4
gpt4 key购买 nike

假设我试图获取某个图像的链接,如下所示:

from bs4 import BeautfiulSoup
import urlparse

soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
srcd = urlparse.urlparse(src)
path = srcd.path # gets the path
fn = os.path.basename(path) # gets filename

# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />

有没有简单的方法可以从中获取完整的网址?或者我必须使用正则表达式?

最佳答案

使用urlparse.urljoin:

>>> import urlparse
>>> base_url = "http://example.com/foo/"
>>> urlparse.urljoin(base_url, "../bar")
'http://example.com/bar'
>>> urlparse.urljoin(base_url, "/baz")
'http://example.com/baz'

关于python - BeautifulSoup 当img src有../..时如何获取url?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13403691/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com