gpt4 book ai didi

python - 如何将 numpy 数组存储为 tfrecord?

转载 作者:太空狗 更新时间:2023-10-29 20:13:23 26 4
gpt4 key购买 nike

我正在尝试从 numpy 数组创建一个 tfrecord 格式的数据集。我正在尝试存储 2d 和 3d 坐标。

2d 坐标是形状为 (2,10) 的 numpy 数组,类型为 float643d 坐标是形状为 (3,10) 的 numpy 数组,类型为 float64

这是我的代码:

def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))


train_filename = 'train.tfrecords' # address to save the TFRecords file
writer = tf.python_io.TFRecordWriter(train_filename)


for c in range(0,1000):

#get 2d and 3d coordinates and save in c2d and c3d

feature = {'train/coord2d': _floats_feature(c2d),
'train/coord3d': _floats_feature(c3d)}
sample = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(sample.SerializeToString())

writer.close()

当我运行它时,我得到了错误:

  feature = {'train/coord2d': _floats_feature(c2d),
File "genData.py", line 19, in _floats_feature
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\python_message.py", line 510, in init
copy.extend(field_value)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in extend
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in <listcomp>
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\type_checkers.py", line 109, in CheckValue
raise TypeError(message)
TypeError: array([-163.685, 240.818, -114.05 , -518.554, 107.968, 427.184,
157.418, -161.798, 87.102, 406.318]) has type <class 'numpy.ndarray'>, but expected one of: ((<class 'numbers.Real'>,),)

我不知道如何解决这个问题。我应该将功能存储为 int64 还是字节?我不知道该怎么做,因为我对 tensorflow 完全陌生。任何帮助都会很棒!谢谢

最佳答案

Tensorflow-Guide 中描述的函数 _floats_feature需要一个标量(float32 或 float64)作为输入。

def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

如您所见,输入的标量被写入列表 (value=[value]),随后将其作为输入提供给 tf.train.FloatListtf.train.FloatList 需要一个在每次迭代中输出 float 的迭代器(就像列表一样)。

如果您的特征不是标量而是矢量,可以重写 _float_feature 以将迭代器直接传递给 tf.train.FloatList(而不是先将其放入列表)。

def _float_array_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))

但是,如果您的特征具有两个或更多维度,则此解决方案将不再适用。就像@mmry 在他的回答中描述的那样,在这种情况下,将您的特征展平或将其拆分为几个一维特征将是一种解决方案。这两种方法的缺点是,如果不进一步投入,有关特征实际形状的信息就会丢失。

为高维数组编写示例消息的另一种可能性是将数组转换为字节字符串,然后使用 Tensorflow-Guide 中描述的 _bytes_feature 函数为其编写示例消息.然后将示例消息序列化并写入 TFRecord 文件。

import tensorflow as tf
import numpy as np

def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))): # if value ist tensor
value = value.numpy() # get value of tensor
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def serialize_array(array):
array = tf.io.serialize_tensor(array)
return array


#----------------------------------------------------------------------------------
# Create example data
array_blueprint = np.arange(4, dtype='float64').reshape(2,2)
arrays = [array_blueprint+1, array_blueprint+2, array_blueprint+3]

#----------------------------------------------------------------------------------
# Write TFrecord file
file_path = 'data.tfrecords'
with tf.io.TFRecordWriter(file_path) as writer:
for array in arrays:
serialized_array = serialize_array(array)
feature = {'b_feature': _bytes_feature(serialized_array)}
example_message = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example_message.SerializeToString())

存储在 TFRecord 文件中的序列化示例消息可以通过 tf.data.TFRecordDataset 访问。解析完示例消息后,需要从转换后的字节字符串中恢复原始数组。这可以通过 tf.io.parse_tensor 实现。

# Read TFRecord file
def _parse_tfr_element(element):
parse_dic = {
'b_feature': tf.io.FixedLenFeature([], tf.string), # Note that it is tf.string, not tf.float32
}
example_message = tf.io.parse_single_example(element, parse_dic)

b_feature = example_message['b_feature'] # get byte string
feature = tf.io.parse_tensor(b_feature, out_type=tf.float64) # restore 2D array from byte string
return feature


tfr_dataset = tf.data.TFRecordDataset('data.tfrecords')
for serialized_instance in tfr_dataset:
print(serialized_instance) # print serialized example messages

dataset = tfr_dataset.map(_parse_tfr_element)
for instance in dataset:
print()
print(instance) # print parsed example messages with restored arrays

关于python - 如何将 numpy 数组存储为 tfrecord?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47861084/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com