gpt4 book ai didi

python - 不能让基于请求构建的脚本从网页生成所有图像链接

转载 作者:行者123 更新时间:2023-12-05 06:11:51 24 4
gpt4 key购买 nike

我正试图从这个 webpage 中获取所有图像使用请求。当我运行我到目前为止创建的脚本时,根本没有得到任何东西。尽管图像在页面源代码中可用,但我无法使该脚本运行。我希望在滚动到底部时抓取所有显示的图像。我还注意到一些链接 https://www.pexels.com/sv-se/sok/office/?format=js&seed=&page=4&type= 在生成所有内容递增的开发工具中找到附在上面的页码。但是我也未能使用该链接生成图像。

到目前为止我已经写了:

import requests
from bs4 import BeautifulSoup

url = 'https://www.pexels.com/sv-se/sok/office/'

with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
s.headers['referer'] = 'https://www.pexels.com/sv-se/'
r = s.get(url)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("a.photo-item__link > img.photo-item__img"):
print(item['data-large-src'])

如何使用请求从该网页获取所有图像链接?

最佳答案

您可以尝试使用此脚本从 URL 获取所有图像链接:

import re
import requests

url = 'https://www.pexels.com/sv-se/sok/office/?format=js&seed=&page={page}&type='

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Referer': 'https://www.pexels.com/sv-se/sok/office/',
'X-Requested-With': 'XMLHttpRequest',
'Accept-Language': 'en-US,en;q=0.5'}
cookies = {'locale': 'sv-SE'}

page = 1
picture_num = 1
while True:
data = requests.get(url.format(page=page), headers=headers, cookies=cookies).text
total_pages = int(re.search(r'"totalPages"\s*:\s*(\d+)', data).group(1))
imgs = re.findall(r"infiniteScrollingAppender\.append\('(.*?)',\s*'", data)

if page > total_pages:
break

for d in imgs:
d = d.replace(r'\'', "'").replace(r'\"', '"').replace(r'\/', "/").replace(r'\n', '\n')
print('{}/{} picture_num={}'.format(page, total_pages, picture_num), BeautifulSoup(d, 'html.parser').select_one('[data-large-src]')['data-large-src'])
picture_num += 1

page += 1

打印:

1/204 picture_num=1 https://images.pexels.com/photos/2041627/pexels-photo-2041627.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=2 https://images.pexels.com/photos/3987020/pexels-photo-3987020.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=3 https://images.pexels.com/photos/3810754/pexels-photo-3810754.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=4 https://images.pexels.com/photos/3178818/pexels-photo-3178818.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=5 https://images.pexels.com/photos/3861958/pexels-photo-3861958.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=6 https://images.pexels.com/photos/3862365/pexels-photo-3862365.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=7 https://images.pexels.com/photos/3746932/pexels-photo-3746932.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=8 https://images.pexels.com/photos/3277806/pexels-photo-3277806.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=9 https://images.pexels.com/photos/1957477/pexels-photo-1957477.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=10 https://images.pexels.com/photos/3184296/pexels-photo-3184296.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=11 https://images.pexels.com/photos/3184357/pexels-photo-3184357.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=12 https://images.pexels.com/photos/4064641/pexels-photo-4064641.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=13 https://images.pexels.com/photos/2041629/pexels-photo-2041629.jpeg?auto=compress&cs=tinysrgb&h=650&w=940
1/204 picture_num=14 https://images.pexels.com/photos/3184359/pexels-photo-3184359.jpeg?auto=compress&cs=tinysrgb&h=650&w=940


...and so on.

enter image description here

关于python - 不能让基于请求构建的脚本从网页生成所有图像链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63792646/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com