gpt4 book ai didi

python - 如何通过Scrapy收集jpeg

转载 作者:太空宇宙 更新时间:2023-11-03 19:53:24 26 4
gpt4 key购买 nike

我想通过Scrapy收集偶像的照片。

收藏主页是https://news.mynavi.jp/article/20191229-947707/ .

我写了蜘蛛...

(save_gradol.py)

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

from gradol.items import GradolItem
class SaveGradolSpider(CrawlSpider):
name = 'save_gradol'
allowed_domains = ['news.mynavi.jp/']
start_urls = ['https://news.mynavi.jp/article/20191229-947707/']


rules = (
Rule(LinkExtractor(allow=(), unique=True), callback="parse_page", follow=True),
)


def parse_page(self, response):
#print("\n>>> Parse " + response.url + " <<<")
item = GradolItem()
item["image_urls"].append(start_urls.rsplit("/", 3)[0] + "/" + response.xpath("//a/@href").extract())
yield item

我也写过项目...

(items.py)


import scrapy
from scrapy.item import Item, Field

class GradolItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
#image_directory_name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()

我还写了管道...

(管道.py)


import scrapy
from scrapy.pipelines.images import ImagesPipeline

class MyImagesPipeline(object):
def process_item(self, item, spider):
return item

我还写了设置...

(设置.py)


ITEM_PIPELINES = {'gradol.pipelines.MyImagesPipeline': 1}
IMAGES_STORE = './savedImages'
MEDIA_ALLOW_REDIRECTS = True

然后,我尝试蜘蛛[sudo scrapy scrapy save_gradol],但不抓取也不收集照片。

请帮我解决这个问题。

最佳答案

你可以用最简单的方法做到这一点:

import requests
from tqdm import tqdm

number_of_photos = 26

for i in tqdm(range(1, number_of_photos + 1)):
image_url = 'https://news.mynavi.jp/article/20191229-947707/images/{:03}l.jpg'.format(i)
try:
response = requests.get(image_url)
except:
pass
else:
if response.status_code == 200:
with open('{:02}.jpg'.format(i), 'wb') as f:
f.write(response.content)

享受。

关于python - 如何通过Scrapy收集jpeg,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59690938/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com