gpt4 book ai didi

python - 如何在python中从img html中抓取src

转载 作者:行者123 更新时间:2023-12-01 00:37:41 24 4
gpt4 key购买 nike

我正在尝试抓取 img 的 src,但是我发现的代码返回了许多 img src,但不是我想要的。我不明白我做错了什么。我正在“https://www.tripadvisor.dk/Restaurant_Review-g189541-d15804886-Reviews-The_Pescatarian-Copenhagen_Zealand.html”上抓取 TripAdvisor

这是我试图从中提取的 HTML 片段:

 <div class="restaurants-detail-overview-cards-LocationOverviewCard__cardColumn--2ALwF"><h6>Placering og kontaktoplysninger</h6><span><div><span data-test-target="staticMapSnapshot" class=""><img class="restaurants-detail-overview-cards-LocationOverviewCard__mapImage--22-Al" src="https://trip-raster.citymaps.io/staticmap?scale=1&amp;zoom=15&amp;size=347x137&amp;language=da&amp;center=55.687988,12.596316&amp;markers=icon:http%3A%2F%2Fc1.tacdn.com%2F%2Fimg2%2Fmaps%2Ficons%2Fcomponent_map_pins_v1%2FR_Pin_Small.png|55.68799,12.596316"></span></div></span>

我希望代码返回:(来自 src 的子字符串)

55.68799,12.596316

我已经尝试过:

    import pandas as pd
pd.options.display.max_colwidth = 200
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
import re

web_url = "https://www.tripadvisor.dk/Restaurant_Review-g189541-d15804886-Reviews-The_Pescatarian-Copenhagen_Zealand.html"
url = urlopen(web_url)
url_html = url.read()

soup = bs(url_html, 'lxml')
soup.find_all('img')

for link in soup.find_all('img'):
print(link.get('src'))

返回的内容与此类似,但不是我需要的 src :

https://static.tacdn.com/img2/branding/rebrand/TA_logo_secondary.svg
https://static.tacdn.com/img2/branding/rebrand/TA_logo_primary.svg
https://static.tacdn.com/img2/branding/rebrand/TA_logo_secondary.svg
data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==
data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

最佳答案

您只需请求并重新即可完成此操作。它只是 src 的坐标部分,它是基于位置的变量。

import requests, re

p = re.compile(r'"coords":"(.*?)"')
r = requests.get('https://www.tripadvisor.dk/Restaurant_Review-g189541-d15804886-Reviews-The_Pescatarian-Copenhagen_Zealand.html')
coords = p.findall(r.text)[1]
src = f'https://trip-raster.citymaps.io/staticmap?scale=1&zoom=15&size=347x137&language=da&center={coords}&markers=icon:http://c1.tacdn.com//img2/maps/icons/component_map_pins_v1/R_Pin_Small.png|{coords}'
print(src)
print(coords)

关于python - 如何在python中从img html中抓取src,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57625093/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com