gpt4 book ai didi

imagenet - 如何从 Imagenet 获取选定的类图像?

转载 作者:行者123 更新时间:2023-12-03 10:03:08 33 4
gpt4 key购买 nike

背景
我一直在玩Deep DreamInceptionism ,使用 Caffe可视化 GoogLeNet 层的框架,为 Imagenet 构建的架构项目,一个用于视觉对象识别的大型视觉数据库。
您可以找到 Imagenet这里:Imagenet 1000 Classes.

为了探索架构并产生“梦想”,我使用了三个笔记本:

  • https://github.com/google/deepdream/blob/master/dream.ipynb
  • https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb
  • https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb

  • 这里的基本思想是从模型或“引导”图像的指定层中的每个 channel 中提取一些特征。
    然后我们将我们希望修改的图像输入模型并提取指定的同一层中的特征(对于每个 Octave 音阶),
    增强最佳匹配特征,即两个特征向量的最大点积。

    到目前为止,我已经设法使用以下方法修改输入图像和控制梦境:
    • (a) applying layers as 'end' objectives for the input image optimization. (see Feature Visualization)
    • (b) using a second image to guide de optimization objective on the input image.
    • (c) visualize Googlenet model classes generated from noise.


    但是,我想要达到的效果介于这些技术之间,我还没有找到任何文档、论文或代码。
    想要的结果 (不是要回答的问题的一部分)

    To have one single class or unit belonging to a given 'end' layer (a) guide the optimization objective (b) and have this class visualized (c) on the input image:


    一个例子,其中 class = 'face'input_image = 'clouds.jpg' :
    enter image description here
    请注意:上图是使用人脸识别模型生成的,该模型未在 Imagenet 上训练数据集。仅用于演示目的。

    工作代码

    Approach (a)

    from cStringIO import StringIO
    import numpy as np
    import scipy.ndimage as nd
    import PIL.Image
    from IPython.display import clear_output, Image, display
    from google.protobuf import text_format
    import matplotlib as plt
    import caffe

    model_name = 'GoogLeNet'
    model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
    net_fn = model_path + 'deploy.prototxt'
    param_fn = model_path + 'bvlc_googlenet.caffemodel'

    model = caffe.io.caffe_pb2.NetParameter()
    text_format.Merge(open(net_fn).read(), model)
    model.force_backward = True
    open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))

    net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
    mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
    channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB

    def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = StringIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))

    # a couple of utility functions for converting to and from Caffe's input image layout
    def preprocess(net, img):
    return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
    def deprocess(net, img):
    return np.dstack((img + net.transformer.mean['data'])[::-1])

    def objective_L2(dst):
    dst.diff[:] = dst.data

    def make_step(net, step_size=1.5, end='inception_4c/output',
    jitter=32, clip=True, objective=objective_L2):
    '''Basic gradient ascent step.'''

    src = net.blobs['data'] # input image is stored in Net's 'data' blob
    dst = net.blobs[end]

    ox, oy = np.random.randint(-jitter, jitter+1, 2)
    src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift

    net.forward(end=end)
    objective(dst) # specify the optimization objective
    net.backward(start=end)
    g = src.diff[0]
    # apply normalized ascent step to the input image
    src.data[:] += step_size/np.abs(g).mean() * g

    src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image

    if clip:
    bias = net.transformer.mean['data']
    src.data[:] = np.clip(src.data, -bias, 255-bias)


    def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4,
    end='inception_4c/output', clip=True, **step_params):
    # prepare base images for all octaves
    octaves = [preprocess(net, base_img)]

    for i in xrange(octave_n-1):
    octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))

    src = net.blobs['data']

    detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details

    for octave, octave_base in enumerate(octaves[::-1]):
    h, w = octave_base.shape[-2:]

    if octave > 0:
    # upscale details from the previous octave
    h1, w1 = detail.shape[-2:]
    detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

    src.reshape(1,3,h,w) # resize the network's input image size
    src.data[0] = octave_base+detail

    for i in xrange(iter_n):
    make_step(net, end=end, clip=clip, **step_params)

    # visualization
    vis = deprocess(net, src.data[0])

    if not clip: # adjust image contrast if clipping is disabled
    vis = vis*(255.0/np.percentile(vis, 99.98))
    showarray(vis)

    print octave, i, end, vis.shape
    clear_output(wait=True)

    # extract details produced on the current octave
    detail = src.data[0]-octave_base
    # returning the resulting image
    return deprocess(net, src.data[0])
    我运行上面的代码:
    end = 'inception_4c/output'
    img = np.float32(PIL.Image.open('clouds.jpg'))
    _=deepdream(net, img)

    Approach (b)

    """
    Use one single image to guide
    the optimization process.

    This affects the style of generated images
    without using a different training set.
    """

    def dream_control_by_image(optimization_objective, end):
    # this image will shape input img
    guide = np.float32(PIL.Image.open(optimization_objective))
    showarray(guide)

    h, w = guide.shape[:2]
    src, dst = net.blobs['data'], net.blobs[end]
    src.reshape(1,3,h,w)
    src.data[0] = preprocess(net, guide)
    net.forward(end=end)

    guide_features = dst.data[0].copy()

    def objective_guide(dst):
    x = dst.data[0].copy()
    y = guide_features
    ch = x.shape[0]
    x = x.reshape(ch,-1)
    y = y.reshape(ch,-1)
    A = x.T.dot(y) # compute the matrix of dot-products with guide features
    dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best

    _=deepdream(net, img, end=end, objective=objective_guide)
    我运行上面的代码:
    end = 'inception_4c/output'
    # image to be modified
    img = np.float32(PIL.Image.open('img/clouds.jpg'))
    guide_image = 'img/guide.jpg'
    dream_control_by_image(guide_image, end)
    问题
    现在失败的方法是我如何尝试访问单个类,对类矩阵进行热编码并专注于一个(到目前为止无济于事):
    def objective_class(dst, class=50):
    # according to imagenet classes
    #50: 'American alligator, Alligator mississipiensis',
    one_hot = np.zeros_like(dst.data)
    one_hot.flat[class] = 1.
    dst.diff[:] = one_hot.flat[class]
    澄清一下:问题不是关于梦想代码,这是有趣的背景并且已经在工作的代码,而只是关于最后一段的问题: 有人可以指导我如何从 ImageNet 获取所选类(class)的图像(参加类(class) #50: 'American alligator, Alligator mississipiensis') (以便我可以将它们用作输入 - 与云图像一起 - 以创建梦想图像)?

    最佳答案

    问题是如何获取所选类别的图像 #50: 'American alligator, Alligator mississipiensis'来自 ImageNet。

  • 转到 image-net.org。
  • 转到“下载”。
  • 按照“下载图像 URL”的说明进行操作:

  • enter image description here

    How to download the URLs of a synset from your Brower?

    1. Type a query in the Search box and click "Search" button

    enter image description here
    enter image description here
    鳄鱼没有显示。 ImageNet is under maintenance. Only ILSVRC synsets are included in the search results.没问题,我们对类似的动物“鳄鱼蜥蜴”没问题,因为这个搜索是为了找到 WordNet 树状图的正确分支。不知道不维护你能不能直接拿到这里的ImageNet图片。
    2. Open a synset papge

    enter image description here
    向下滚动:
    enter image description here
    向下滚动:
    enter image description here
    寻找美洲短吻鳄,它恰好也是蜥蜴类二足类爬行动物,作为近邻:
    enter image description here
    3. You will find the "Download URLs" button under the left-bottom corner of the image browsing window.

    enter image description here
    您将获得所选类的所有 URL。浏览器中弹出一个文本文件:
    http://image-net.org/api/text/imagenet.synset.geturls?wnid=n01698640
    我们在这里看到,这只是知道需要放在 URL 末尾的正确 WordNet id。
    手动图片下载
    文本文件如下所示:
    enter image description here
  • http://farm1.static.flickr.com/136/326907154_d975d0c944.jpg
  • http://weeksbay.org/photo_gallery/reptiles/American20Alligator.jpg
  • ...
  • 直到图像编号 1261。

  • 例如,第一个 URL 链接到:
    enter image description here
    第二个是死链接:
    enter image description here
    第三个链接已经死了,但第四个还在工作。
    enter image description here
    这些网址的图片是公开的,但是很多链接都失效了,而且图片分辨率较低。
    自动图像下载
    再次来自 ImageNet 指南:

    How to download by HTTP protocol? To download a synset by HTTPrequest, you need to obtain the "WordNet ID" (wnid) of a synset first.When you use the explorer to browse a synset, you can find the WordNetID below the image window.(Click Here and search "Synset WordNet ID"to find out the wnid of "Dog, domestic dog, Canis familiaris" synset).To learn more about the "WordNet ID", please refer to

    Mapping between ImageNet and WordNet

    Given the wnid of a synset, the URLs of its images can be obtained at

    http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid]

    You can also get the hyponym synsets given wnid, please refer to APIdocumentation to learn more.


    那么 API documentation 里面是什么? ?
    拥有获取所有 WordNet ID(所谓的“同义词集 ID”)及其所有同义词集的词所需的一切,也就是说,它有任何类名及其手头的 WordNet ID,都是免费的。

    Obtain the words of a synset

    Given the wnid of a synset, the words ofthe synset can be obtained at

    http://www.image-net.org/api/text/wordnet.synset.getwords?wnid=[wnid]

    You can also Click Here todownload the mapping between WordNet ID and words for all synsets,Click Here to download themapping between WordNet ID and glosses for all synsets.


    如果您知道选择的 WordNet id 及其类名,您可以使用“nltk”(自然语言工具包)的 nltk.corpus.wordnet,参见 WordNet interface .
    在我们的例子中,我们只需要类 #50: 'American alligator, Alligator mississipiensis' 的图像。 ,我们已经知道我们需要什么,因此我们可以将 nltk.corpus.wordnet 放在一边(有关更多信息,请参阅教程或 Stack Exchange 问题)。我们可以通过循环访问仍然存在的 URL 来自动下载所有鳄鱼图像。当然,我们也可以将其扩展到完整的 WordNet,并在所有 WordNet ID 上循环,尽管这对于整个树状图来说会花费太多时间 - 并且也不推荐,因为如果有 1000 人下载图像将停止存在他们每天。
    恐怕我不会花时间编写这个接受 ImageNet 类号“#50”作为参数的 Python 代码,尽管这也应该是可能的,使用从 WordNet 到 ImageNet 的映射表。类名和 WordNet ID 应该就够了。
    对于单个 WordNet ID,代码可能如下:
    import urllib.request 
    import csv

    wnid = "n01698640"
    url = "http://image-net.org/api/text/imagenet.synset.geturls?wnid=" + str(wnid)

    # From https://stackoverflow.com/a/45358832/6064933
    req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    with open(wnid + ".csv", "wb") as f:
    with urllib.request.urlopen(req) as r:
    f.write(r.read())

    with open(wnid + ".csv", "r") as f:
    counter = 1
    for line in f.readlines():
    print(line.strip("\n"))
    failed = []
    try:
    with urllib.request.urlopen(line) as r2:
    with open(f'''{wnid}_{counter:05}.jpg''', "wb") as f2:
    f2.write(r2.read())
    except:
    failed.append(f'''{counter:05}, {line}'''.strip("\n"))
    counter += 1
    if counter == 10:
    break

    with open(wnid + "_failed.csv", "w", newline="") as f3:
    writer = csv.writer(f3)
    writer.writerow(failed)
    结果:
    enter image description here
  • 如果您甚至需要死链接背后的原始质量图像,并且您的项目是非商业性的,您可以登录,请参阅“如何获得图像的副本?”在 Download FAQ .
  • 在上面的 URL 中,您会看到 wnid=n01698640在 URL 的末尾,它是映射到 ImageNet 的 WordNet id。
  • 或者在“同义词集的图像”选项卡中,只需单击“Wordnet ID”。

  • enter image description here
    到达,得到:
    enter image description here
    或右击——另存为:
    enter image description here
    您可以使用 WordNet id 获取原始图像。
    enter image description here
    如果你是商业人士,我会说联系 ImageNet 团队。

    添加在
    拿一个评论的想法:如果你不想要很多图像,而只想尽可能多地代表类的“单一类图像”,看看 Visualizing GoogLeNet Classes并尝试将此方法用于 ImageNet 的图像。这也使用了 deepdream 代码。

    Visualizing GoogLeNet Classes

    1. July 2015

    Ever wondered what a deep neural network thinks a Dalmatian shouldlook like? Well, wonder no more.

    Recently Google published a post describing how they managed to usedeep neural networks to generate class visualizations and modifyimages through the so called “inceptionism” method. They laterpublished the code to modify images via the inceptionism methodyourself, however, they didn’t publish code to generate the classvisualizations they show in the same post.

    While I never figured out exactly how Google generated their classvisualizations, after butchering the deepdream code and this ipythonnotebook from Kyle McDonald, I managed to coach GoogLeNet into drawingthese:

    enter image description here

    ... [with many other example images to follow]

    关于imagenet - 如何从 Imagenet 获取选定的类图像?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49162455/

    33 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com