gpt4 book ai didi

python - 卡在网页抓取代码上

转载 作者:太空宇宙 更新时间:2023-11-03 16:45:21 25 4
gpt4 key购买 nike

我有以下代码,我想转到网页并将所有相关漫画从网站上拉下来并将它们存储在我的计算机上。第一张图片下载正常,但转到网页上前一页的循环似乎存在问题。如果有人可以查看代码并提供帮助,我们将不胜感激。我得到的错误是:

'Traceback (most recent call last):
File "C:\Users\528000\Desktop\kids print\Comic-gather.py", line 41, in <module
>
prevLink = soup.select('a[class="prevLink"]')[0]
'IndexError: list index out of range


'import requests, os, bs4
url = 'http://darklegacycomics.com'
os.makedirs('darklegacy', exist_ok=True)
while not url.endswith('#'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select('.comic img')
if comicElem == []:
print('Could not find comic image.')
else:
try:
comicUrl ='http://darklegacycomics.com' + comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
except requests.exceptions.MissingSchema:
# skip this comic
prevLink = soup.select('.prevlink')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')
continue
# Save the image to ./darklegacy.
imageFile = open(os.path.join('darklegacy', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()

# Get the Prev button's url.
prevLink = soup.select('a[class="prevLink"]')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')''

最佳答案

这将获取您的所有图像:

import requests, os, bs4
from urlparse import urljoin
url = 'http://darklegacycomics.com'

soup = bs4.BeautifulSoup(requests.get(url).content)

# get all img links where src value starts with /images
links = soup.select(".comic img[src^=/image]")


for img in links:
# extract the link
src = img["src"]
# use the image name as the file name
with open(os.path.basename(src),"w") as f:
# join the base an image url and write content to disk
f.write(requests.get(urljoin(url, src)).content)

关于python - 卡在网页抓取代码上,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36360606/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com