gpt4 book ai didi

python - 我的抓取工具抛出错误而不是下载图像

转载 作者:行者123 更新时间:2023-12-01 03:01:51 24 4
gpt4 key购买 nike

我制作了一个抓取工具来从网站下载图像。但是,当我运行此命令时,它会抛出错误,显示: [raise HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError:HTTP 错误 403]。我也在其他网站上使用了这种方法来抓取图像,但没有遇到任何问题。我无法弄清楚为什么会出现此错误以及解决方法是什么。希望有人研究一下。

import requests
import urllib.request
from lxml import html

def PictureScraping():
url = "https://www.yify-torrent.org/search/1080p/"
response = requests.get(url)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="movie-image"]')
for title in titles:
Pics = "https:" + title.xpath('.//img/@src')[0]
urllib.request.urlretrieve(Pics, Pics.split('/')[-1])
PictureScraping()

最佳答案

您需要使用与获取初始页面相同的网络抓取 session 来下载图像。工作代码:

import requests
from lxml import html


def PictureScraping():
url = "https://www.yify-torrent.org/search/1080p/"
with requests.Session() as session:
response = session.get(url)

tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="movie-image"]')
for title in titles:
image_url = title.xpath('.//img/@src')[0]
image_name = image_url.split('/')[-1]
print(image_name)
image_url = "https:" + image_url

# download image
response = session.get(image_url, stream=True)
if response.status_code == 200:
with open(image_name, 'wb') as f:
for chunk in response.iter_content(1024):
f.write(chunk)

PictureScraping()

关于python - 我的抓取工具抛出错误而不是下载图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43738941/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com