gpt4 book ai didi

python - 如何返回最大尺寸的图像

转载 作者:行者123 更新时间:2023-12-02 16:58:49 24 4
gpt4 key购买 nike

我已经能够过滤出页面中的所有图片网址,并一个接一个地显示它们

import requests
from bs4 import BeautifulSoup


article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"
response = requests.get(article_URL)
soup = bs4.BeautifulSoup(response.text,'html.parser')
images = soup.find('body').find_all('img')
i = 0
image_url = []
for im in images:
print(im)
i+=1
url = im.get('src')
image_url.append(url)
print('Downloading: ', url)
try:
response = requests.get(url, stream=True)
with open(str(i) + '.jpg', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
except:
print('Could not download: ', url)

new = [x for x in image_url if x is not None]
for url in new:
resp = requests.get(url, stream=True).raw
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
# height, width, channels = image.shape
height, width, _ = image.shape
dimension = []
for items in height, width:
dimension.append(items)
# print(height, width)
print(dimension)
我想从URL列表中打印最大尺寸的图像
这是我从列表中得到的结果,这还不够好
[72, 72]
[95, 96]
[13, 60]
[227, 973]
[17, 60]
[229, 771]

最佳答案

我看到两个问题。

  • 您可以在循环内创建dimention = [],以便删除先前的值。您必须在循环和内部循环使用之前创建dimention = []
    dimension.append( (width, height) )
    在循环之后,您可以使用max(dimension)与max width配对
  • 您仅将width, height保留在dimension中,因此您不知道哪个文件具有此尺寸。您应该保留所有信息
    dimension.append( (width, height, url, filename) ) 

  • 我的版本。
    我使用字典 data保留所有信息
    data.append({
    'url': url,
    'path': filename,
    'width': width,
    'height': height,
    })
    然后我在 key中使用 max()来获取最大 width的项目
    max(data, key=lambda x:x['width'])
    但是我可以使用 x['height']x['width'] * x['height']的方式相同
    import requests
    from bs4 import BeautifulSoup
    import shutil
    import cv2

    article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"

    response = requests.get(article_URL)
    soup = BeautifulSoup(response.text, 'html.parser')
    images = soup.find('body').find_all('img')

    # --- loop ---

    data = []
    i = 0

    for img in images:
    print('HTML:', img)

    url = img.get('src')

    if url: # skip `url` with `None`
    print('Downloading:', url)
    try:
    response = requests.get(url, stream=True)

    i += 1
    url = url.rsplit('?', 1)[0] # remove ?opt=20 after filename
    ext = url.rsplit('.', 1)[-1] # .png, .jpg, .jpeg
    filename = f'{i}.{ext}'
    print('Filename:', filename)

    with open(filename, 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)

    image = cv2.imread(filename)
    height, width = image.shape[:2]

    data.append({
    'url': url,
    'path': filename,
    'width': width,
    'height': height,
    })

    except Exception as ex:
    print('Could not download: ', url)
    print('Exception:', ex)

    print('---')

    # --- after loop ---

    print('max:', max(data, key=lambda x:x['width']))

    all_sorted = sorted(data, key=lambda x:x['width'], reverse=True)

    print('Top 3:', all_sorted[:3])
    # or
    for item in all_sorted[:3]:
    print(item['width'], item['url'])

    BTW:仅使用 src获取图像
     .find_all('img', {'src': True})

    关于python - 如何返回最大尺寸的图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63345648/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com