Tensorflow:加载未知的 TFRecord 数据集-6ren

Tensorflow:加载未知的 TFRecord 数据集

转载作者：行者123 更新时间：2023-12-04 01:47:26

26

4

我得到了一个 TFRecord 数据文件 filename = train-00000-of-00001，其中包含未知大小的图像，可能还包含其他信息。我知道我可以使用 dataset = tf.data.TFRecordDataset(filename) 打开数据集。

如何从此文件中提取图像并将其保存为 numpy 数组？

我也不知道 TFRecord 文件中是否保存了任何其他信息，例如标签或分辨率。我怎样才能得到这些信息？如何将它们保存为 numpy 数组？

我通常只使用 numpy 数组，不熟悉 TFRecord 数据文件。

最佳答案

1.) 如何从此文件中提取图像并将其保存为 numpy 数组？

你要找的是这个:

record_iterator = tf.python_io.tf_record_iterator(path=filename)

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # Exit after 1 iteration as this is purely demonstrative.
  break

2.) 我怎样才能得到这些信息？

这里是官方documentation .我强烈建议您阅读文档，因为它会逐步介绍如何提取您正在寻找的值。

本质上，您必须将 example 转换为字典。因此，如果我想找出 tfrecord 文件中的信息类型，我会做这样的事情(在第一个问题中陈述的代码的上下文中):dict(example. features.feature).keys()

3.) 如何将它们保存为 numpy 数组？

我会以上面提到的 for 循环为基础。因此，对于每个循环，它都会提取您感兴趣的值并将它们附加到 numpy 数组中。如果需要，您可以从这些数组创建一个 pandas 数据框并将其保存为 csv 文件。

但是……

您似乎有多个 tfrecord 文件... tf.data.TFRecordDataset(filename)返回用于训练模型的数据集。

因此在多个 tfrecords 的情况下，您将需要一个双 for 循环。外循环将遍历每个文件。对于该特定文件，内部循环将遍历所有 tf.examples。

编辑:

转换为 np.array()

import tensorflow as tf
from PIL import Image
import io

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # Get the values in a dictionary
  example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
  image_array = np.array(Image.open(io.BytesIO(example_bytes)))
  print(image_array)
  break

上述代码的来源:

Base code
正在转换 bytes to PIL.JpegImagePlugin.JpegImageFile
从 PIL.JpegImagePlugin.JpegImageFile to np.array 转换

PIL 的官方文档

编辑 2:

import tensorflow as tf
from PIL import Image
import io
import numpy as np

# Load image
cat_in_snow  = tf.keras.utils.get_file(path, 'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg')

#------------------------------------------------------Convert to tfrecords
def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def image_example(image_string):
  feature = {
      'image_raw': _bytes_feature(image_string),
  }
  return tf.train.Example(features=tf.train.Features(feature=feature))

with tf.python_io.TFRecordWriter('images.tfrecords') as writer:
    image_string = open(cat_in_snow, 'rb').read()
    tf_example = image_example(image_string)
    writer.write(tf_example.SerializeToString())
#------------------------------------------------------


#------------------------------------------------------Begin Operation
record_iterator = tf.python_io.tf_record_iterator(path to tfrecord file)

for string_record in record_iterator:
  example = tf.train.Example()
  example.ParseFromString(string_record)

  print(example)

  # OPTION 1: convert bytes to arrays using PIL and IO
  example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
  PIL_array = np.array(Image.open(io.BytesIO(example_bytes)))

  # OPTION 2: convert bytes to arrays using Tensorflow
  with tf.Session() as sess:
      TF_array = sess.run(tf.image.decode_jpeg(example_bytes, channels=3))

  break
#------------------------------------------------------


#------------------------------------------------------Compare results
(PIL_array.flatten() != TF_array.flatten()).sum()
PIL_array == TF_array

PIL_img = Image.fromarray(PIL_array, 'RGB')
PIL_img.save('PIL_IMAGE.jpg')

TF_img = Image.fromarray(TF_array, 'RGB')
TF_img.save('TF_IMAGE.jpg')
#------------------------------------------------------

请记住，tfrecords 只是一种存储信息的方式，供 tensorflow 模型以高效方式读取。
我使用 PIL 和 IO 从本质上将字节转换为图像。 IO 获取字节并将它们转换为 file like object然后 PIL.Image 可以读取
是的，有一种纯粹的 tensorflow 方法:tf.image.decode_jpeg
是的，当你比较两个数组时，这两种方法是有区别的
你应该选择哪一个？如果您担心 Tensorflow's github 中所述的准确性，则 Tensorflow 不是解决之道。 :“TensorFlow 为 jpeg 解码选择的默认设置是 IFAST，为了速度牺牲了图像质量”。此信息的功劳属于此 post

关于Tensorflow:加载未知的 TFRecord 数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54716696/

26

4

0

文章推荐： apache-spark - 使用数据集在 Apache Spark 中交叉加入非常慢

文章推荐： angular - 如何使用 ngrx 监听成功操作

文章推荐： c# - AddSingleton 与异步调用？

文章推荐： xcode - 我无法让我的 ionic 应用程序在 iOS 模拟器中打开

postgresql - 函数交叉表(未知，未知)不存在但确实存在
我有一个交叉表函数，我过去曾多次成功使用它，但现在它在最后转储所有数据，而不是将其旋转到输出表中。它似乎无法找到交叉表。我通过以下方式对其进行了研究；如果 tablefunc 不存在则创建扩展； -
SQL 查询计数所有(未知，已知)，未知，已知客户，通过电话 Mac 地址唯一标识
表1(客户表) Id, CustomerId, IsKnownCustomer,phonemacaddress 1, 空 0 00:9a:34:cf:a4 2, 004024 1 00:6f:64:c
azure - 无法拉取镜像 myapidemodocker.azurecr.io/apidemo :v4. 0:rpc 错误:代码 = 未知 desc = 未知 blob
知道为什么我总是收到这个烦人且无用的错误代码/描述吗？ Failed to pull image myapidemodocker.azurecr.io/apidemo:v4.0: rpc error:
PHP登录问题；未知
我正在进行 PHP 登录，并且之前可以正常工作，但我尝试使用户名功能不区分大小写，但此后代码一直无法正常工作。我删除了我添加的所有内容，以尝试使其不区分大小写，即 strtolower()。页面上显示
php - 第0行的PHP缓冲错误<未知>
有人会帮助我提供有关此错误的任何可能信息吗？原因？登录？在哪里寻找/开始？ Cannot use output buffering in output buffering display handl
javascript - $routeProvider 未知
我已经添加了这样的脚本我在我的 test.js 中做了这个 var app = angular.module('MyApp', ['ngRoute']).config
java - SSO，未知
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 8 年前。 Improve this qu
mysql 语句 WHERE 未知
我有这个sql语句: selectAllUsersByCriteria = 连接.prepareStatement( “从用户那里选择*？=？” )；下面的方法运行该语句: public Array
android:textCursorDrawable 未知
我有一个白色的 EditText，在 Android 3.1 及更高版本中，光标不显示(因为它也是白色的)。有关信息，我使用 android:background="@android:drawable
python - 未知？塑造keras深度学习
我正在尝试使用 Keras 实现深度学习模型。但是我有一个未知形状实现的问题。我一直在寻找类似的错误，但没有找到。这是我的代码。 Xhome = dataset[:,32:62] Xaway = d
c# - XMLReader 未知
关注此introduction可以通过导入命名空间 System.Xml 来使用 XMLReader 类。在我的 Visual Studio 项目中，我使用 .NET 4.0，但 System.Xml
c++ - 通过指向错误函数类型的指针调用函数(未知)
我有一个动态链接库的程序。该程序将函数指针传递给该库以执行。但是 ubsan(Undefined Behavior Sanitizer)指定指针位于错误的函数类型上。那只会发生如果回调函数有一个类
ios - AVAudioSession 未知
我正在尝试在我的 Swift SpriteKit 应用程序中使用 AVAudioSession。我遇到了奇怪的“未声明类型”问题。例如…… import AVFoundation var audioS
c++ - 专门化变量的值在编译时是否已知/未知
如果在编译期间(在实际编译和运行程序之前)其参数之一的值已知/未知，如何专门化模板函数？我还不知道怎么做。想法 1: #include #include int main(void){
c# - 未知 while while 语句
我看到一些人的代码是这样的: while (!(baseType == typeof(Object))) { .... baseType = baseType.BaseType;
具有不同(未知)字符串匹配的正则表达式
我正在尝试使用 GoColly 框架获取所有 HREF 链接，但是只允许任何域的 url 为根 URL 或子域(否路径)。我已经注释掉了我的 REGEXP。文件扩展名没有事情。我只是在“/”之后不想要
java - 抽象模式类型 'User_Book' 未知
我有一个包含多个实体的数据库，特别是 Book 和 User。它们之间存在这样的 ManyToMany 关系: 书: @Entity @Table(name = "Books") public cla
vba - 如何将一系列行排序到一定数量的(未知)列？
如果我将范围的初始部分设置为 Range("A:A")，如何确保将整行传递给排序？数据 id、fname、mname、lname、后缀、状态、位置、时区通过在 id 中搜索起点和终点来选择范围。
Kubernetes AutoScaler未缩放，HPA显示目标<未知>
我对kubernetes很陌生，而对于docker来说就不那么多了。我一直在研究示例，但是我对自动缩放器(似乎无法缩放)感到困惑。我在这里通过示例https://kubernetes.io/doc
Silverlight 工具包命名空间为 "sometimes"未知
我在 ChildWindow 中使用 SL Toolkit 5 中的 BusyIndicator 控件。在某些解决方案中，它可以工作，但在其他解决方案中，使用完全相同的代码(至少看起来)，我在运

首页

博学

6Ren·AI

商城

Tensorflow:加载未知的 TFRecord 数据集