imagenet - 如何从 Imagenet 获取选定的类图像？-6ren

imagenet - 如何从 Imagenet 获取选定的类图像？

转载作者：行者123 更新时间：2023-12-03 10:03:08

背景
我一直在玩Deep Dream和 Inceptionism ，使用 Caffe可视化 GoogLeNet 层的框架，为 Imagenet 构建的架构项目，一个用于视觉对象识别的大型视觉数据库。
您可以找到 Imagenet这里:Imagenet 1000 Classes.

为了探索架构并产生“梦想”，我使用了三个笔记本:

https://github.com/google/deepdream/blob/master/dream.ipynb

https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb

https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb

这里的基本思想是从模型或“引导”图像的指定层中的每个 channel 中提取一些特征。
然后我们将我们希望修改的图像输入模型并提取指定的同一层中的特征(对于每个 Octave 音阶)，
增强最佳匹配特征，即两个特征向量的最大点积。

到目前为止，我已经设法使用以下方法修改输入图像和控制梦境:

(a) applying layers as 'end' objectives for the input image optimization. (see Feature Visualization)

(b) using a second image to guide de optimization objective on the input image.

(c) visualize Googlenet model classes generated from noise.

但是，我想要达到的效果介于这些技术之间，我还没有找到任何文档、论文或代码。
想要的结果 (不是要回答的问题的一部分)

To have one single class or unit belonging to a given 'end' layer (a) guide the optimization objective (b) and have this class visualized (c) on the input image:

一个例子，其中 class = 'face'和 input_image = 'clouds.jpg' :

请注意:上图是使用人脸识别模型生成的，该模型未在 Imagenet 上训练数据集。仅用于演示目的。

工作代码

Approach (a)

from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import matplotlib as plt    
import caffe
         
model_name = 'GoogLeNet' 
model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
net_fn   = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
   
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))
    
net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
                       mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
                       channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB

def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = StringIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))
  
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
    return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
    return np.dstack((img + net.transformer.mean['data'])[::-1])
      
def objective_L2(dst):
    dst.diff[:] = dst.data 

def make_step(net, step_size=1.5, end='inception_4c/output', 
              jitter=32, clip=True, objective=objective_L2):
    '''Basic gradient ascent step.'''

    src = net.blobs['data'] # input image is stored in Net's 'data' blob
    dst = net.blobs[end]

    ox, oy = np.random.randint(-jitter, jitter+1, 2)
    src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
            
    net.forward(end=end)
    objective(dst)  # specify the optimization objective
    net.backward(start=end)
    g = src.diff[0]
    # apply normalized ascent step to the input image
    src.data[:] += step_size/np.abs(g).mean() * g

    src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
            
    if clip:
        bias = net.transformer.mean['data']
        src.data[:] = np.clip(src.data, -bias, 255-bias)

 
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4, 
              end='inception_4c/output', clip=True, **step_params):
    # prepare base images for all octaves
    octaves = [preprocess(net, base_img)]
    
    for i in xrange(octave_n-1):
        octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
    
    src = net.blobs['data']
    
    detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
    
    for octave, octave_base in enumerate(octaves[::-1]):
        h, w = octave_base.shape[-2:]
        
        if octave > 0:
            # upscale details from the previous octave
            h1, w1 = detail.shape[-2:]
            detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

        src.reshape(1,3,h,w) # resize the network's input image size
        src.data[0] = octave_base+detail
        
        for i in xrange(iter_n):
            make_step(net, end=end, clip=clip, **step_params)
            
            # visualization
            vis = deprocess(net, src.data[0])
            
            if not clip: # adjust image contrast if clipping is disabled
                vis = vis*(255.0/np.percentile(vis, 99.98))
            showarray(vis)

            print octave, i, end, vis.shape
            clear_output(wait=True)
            
        # extract details produced on the current octave
        detail = src.data[0]-octave_base
    # returning the resulting image
    return deprocess(net, src.data[0])

我运行上面的代码:

end = 'inception_4c/output'
img = np.float32(PIL.Image.open('clouds.jpg'))
_=deepdream(net, img)

Approach (b)

"""
Use one single image to guide 
the optimization process.

This affects the style of generated images 
without using a different training set.
"""

def dream_control_by_image(optimization_objective, end):
    # this image will shape input img
    guide = np.float32(PIL.Image.open(optimization_objective))  
    showarray(guide)
  
    h, w = guide.shape[:2]
    src, dst = net.blobs['data'], net.blobs[end]
    src.reshape(1,3,h,w)
    src.data[0] = preprocess(net, guide)
    net.forward(end=end)

    guide_features = dst.data[0].copy()
    
    def objective_guide(dst):
        x = dst.data[0].copy()
        y = guide_features
        ch = x.shape[0]
        x = x.reshape(ch,-1)
        y = y.reshape(ch,-1)
        A = x.T.dot(y) # compute the matrix of dot-products with guide features
        dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best

    _=deepdream(net, img, end=end, objective=objective_guide)

我运行上面的代码:

end = 'inception_4c/output'
# image to be modified
img = np.float32(PIL.Image.open('img/clouds.jpg'))
guide_image = 'img/guide.jpg'
dream_control_by_image(guide_image, end)

问题
现在失败的方法是我如何尝试访问单个类，对类矩阵进行热编码并专注于一个(到目前为止无济于事):

def objective_class(dst, class=50):
   # according to imagenet classes 
   #50: 'American alligator, Alligator mississipiensis',
   one_hot = np.zeros_like(dst.data)
   one_hot.flat[class] = 1.
   dst.diff[:] = one_hot.flat[class]

澄清一下:问题不是关于梦想代码，这是有趣的背景并且已经在工作的代码，而只是关于最后一段的问题: 有人可以指导我如何从 ImageNet 获取所选类(class)的图像(参加类(class) #50: 'American alligator, Alligator mississipiensis') (以便我可以将它们用作输入 - 与云图像一起 - 以创建梦想图像)？

最佳答案

问题是如何获取所选类别的图像 #50: 'American alligator, Alligator mississipiensis'来自 ImageNet。

转到 image-net.org。

转到“下载”。

按照“下载图像 URL”的说明进行操作:

How to download the URLs of a synset from your Brower?
1. Type a query in the Search box and click "Search" button

鳄鱼没有显示。 ImageNet is under maintenance. Only ILSVRC synsets are included in the search results.没问题，我们对类似的动物“鳄鱼蜥蜴”没问题，因为这个搜索是为了找到 WordNet 树状图的正确分支。不知道不维护你能不能直接拿到这里的ImageNet图片。

2. Open a synset papge

向下滚动:

寻找美洲短吻鳄，它恰好也是蜥蜴类二足类爬行动物，作为近邻:

3. You will find the "Download URLs" button under the left-bottom corner of the image browsing window.

您将获得所选类的所有 URL。浏览器中弹出一个文本文件:
http://image-net.org/api/text/imagenet.synset.geturls?wnid=n01698640
我们在这里看到，这只是知道需要放在 URL 末尾的正确 WordNet id。
手动图片下载
文本文件如下所示:

http://farm1.static.flickr.com/136/326907154_d975d0c944.jpg

http://weeksbay.org/photo_gallery/reptiles/American20Alligator.jpg

...

直到图像编号 1261。

例如，第一个 URL 链接到:

第二个是死链接:

第三个链接已经死了，但第四个还在工作。

这些网址的图片是公开的，但是很多链接都失效了，而且图片分辨率较低。
自动图像下载
再次来自 ImageNet 指南:

How to download by HTTP protocol? To download a synset by HTTPrequest, you need to obtain the "WordNet ID" (wnid) of a synset first.When you use the explorer to browse a synset, you can find the WordNetID below the image window.(Click Here and search "Synset WordNet ID"to find out the wnid of "Dog, domestic dog, Canis familiaris" synset).To learn more about the "WordNet ID", please refer to
Mapping between ImageNet and WordNet
Given the wnid of a synset, the URLs of its images can be obtained at
http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid]
You can also get the hyponym synsets given wnid, please refer to APIdocumentation to learn more.

那么 API documentation 里面是什么？ ?
拥有获取所有 WordNet ID(所谓的“同义词集 ID”)及其所有同义词集的词所需的一切，也就是说，它有任何类名及其手头的 WordNet ID，都是免费的。

Obtain the words of a synset

Given the wnid of a synset, the words ofthe synset can be obtained at
http://www.image-net.org/api/text/wordnet.synset.getwords?wnid=[wnid]
You can also Click Here todownload the mapping between WordNet ID and words for all synsets,Click Here to download themapping between WordNet ID and glosses for all synsets.

如果您知道选择的 WordNet id 及其类名，您可以使用“nltk”(自然语言工具包)的 nltk.corpus.wordnet，参见 WordNet interface .
在我们的例子中，我们只需要类 #50: 'American alligator, Alligator mississipiensis' 的图像。，我们已经知道我们需要什么，因此我们可以将 nltk.corpus.wordnet 放在一边(有关更多信息，请参阅教程或 Stack Exchange 问题)。我们可以通过循环访问仍然存在的 URL 来自动下载所有鳄鱼图像。当然，我们也可以将其扩展到完整的 WordNet，并在所有 WordNet ID 上循环，尽管这对于整个树状图来说会花费太多时间 - 并且也不推荐，因为如果有 1000 人下载图像将停止存在他们每天。
恐怕我不会花时间编写这个接受 ImageNet 类号“#50”作为参数的 Python 代码，尽管这也应该是可能的，使用从 WordNet 到 ImageNet 的映射表。类名和 WordNet ID 应该就够了。
对于单个 WordNet ID，代码可能如下:

import urllib.request 
import csv

wnid = "n01698640"
url = "http://image-net.org/api/text/imagenet.synset.geturls?wnid=" + str(wnid)

# From https://stackoverflow.com/a/45358832/6064933
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
with open(wnid + ".csv", "wb") as f:
    with urllib.request.urlopen(req) as r:
        f.write(r.read())

with open(wnid + ".csv", "r") as f:
    counter = 1
    for line in f.readlines():      
        print(line.strip("\n"))
        failed = []
        try:
            with urllib.request.urlopen(line) as r2:
                with open(f'''{wnid}_{counter:05}.jpg''', "wb") as f2:
                    f2.write(r2.read())
        except:
            failed.append(f'''{counter:05}, {line}'''.strip("\n"))
        counter += 1
        if counter == 10:
            break

with open(wnid + "_failed.csv", "w", newline="") as f3:
    writer = csv.writer(f3)
    writer.writerow(failed)

结果:

如果您甚至需要死链接背后的原始质量图像，并且您的项目是非商业性的，您可以登录，请参阅“如何获得图像的副本？”在 Download FAQ .

在上面的 URL 中，您会看到 wnid=n01698640在 URL 的末尾，它是映射到 ImageNet 的 WordNet id。

或者在“同义词集的图像”选项卡中，只需单击“Wordnet ID”。

到达，得到:

或右击——另存为:

您可以使用 WordNet id 获取原始图像。

如果你是商业人士，我会说联系 ImageNet 团队。

添加在
拿一个评论的想法:如果你不想要很多图像，而只想尽可能多地代表类的“单一类图像”，看看 Visualizing GoogLeNet Classes并尝试将此方法用于 ImageNet 的图像。这也使用了 deepdream 代码。

Visualizing GoogLeNet Classes

July 2015

Ever wondered what a deep neural network thinks a Dalmatian shouldlook like? Well, wonder no more.

Recently Google published a post describing how they managed to usedeep neural networks to generate class visualizations and modifyimages through the so called “inceptionism” method. They laterpublished the code to modify images via the inceptionism methodyourself, however, they didn’t publish code to generate the classvisualizations they show in the same post.

While I never figured out exactly how Google generated their classvisualizations, after butchering the deepdream code and this ipythonnotebook from Kyle McDonald, I managed to coach GoogLeNet into drawingthese:

... [with many other example images to follow]

关于imagenet - 如何从 Imagenet 获取选定的类图像？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49162455/

文章推荐： Applescript 延迟问题

文章推荐： ffmpeg - 与 libav 混合

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

imagenet - 如何从 Imagenet 获取选定的类图像？