gpt4 book ai didi

tensorflow - 在 Tensorflow 数据集 api : How to use padded_batch so that a pads with a specific value without specifying the number of pads

转载 作者:行者123 更新时间:2023-12-03 16:24:12 25 4
gpt4 key购买 nike

如果您不指定 padding_values然后 padded_batch将自动填充 0。但是,如果您想要不同的值,例如 -1,则不能只设置 padded_batch = -1 .您需要为需要填充的每个插槽输入一个序列。

但是,我正在使用一个具有随机数组长度值的数据集,所以我不能真正做到这一点,因为我不知道需要填充多少个数字。

padding_values将自动用 0 填充其余的值,我希望有某种方法可以使用不同的值(例如“-1”)来做到这一点。

这是一个最小的例子

import math
import numpy as np
import tensorflow as tf

cells = np.array([[0,1,2,3], [2,3,4], [3,6,5,4,3], [3,9]])
mells = np.array([[0], [2], [3], [9]])
print(cells)

writer = tf.python_io.TFRecordWriter('test.tfrecords')
for index in range(mells.shape[0]):
example = tf.train.Example(features=tf.train.Features(feature={
'num_value':tf.train.Feature(int64_list=tf.train.Int64List(value=mells[index])),
'list_value':tf.train.Feature(int64_list=tf.train.Int64List(value=cells[index]))
}))
writer.write(example.SerializeToString())
writer.close()

#Generate Samples with batch size of 2

filenames = ["test.tfrecords"]
dataset = tf.data.TFRecordDataset(filenames)
def _parse_function(example_proto):
keys_to_features = {'num_value':tf.VarLenFeature(tf.int64),
'list_value':tf.VarLenFeature(tf.int64)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return tf.sparse.to_dense(parsed_features['num_value']), \
tf.sparse.to_dense(parsed_features['list_value'])
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
# Shuffle the dataset
dataset = dataset.shuffle(buffer_size=1)
# Repeat the input indefinitly
dataset = dataset.repeat()
# Generate batches
dataset = dataset.padded_batch(2, padded_shapes=([None],[None]), padding_values=-1)
# Create a one-shot iterator
iterator = dataset.make_one_shot_iterator()
i, data = iterator.get_next()

这是错误信息
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-65494605bf11> in <module>()
14 dataset = dataset.repeat()
15 # Generate batches
---> 16 dataset = dataset.padded_batch(2, padded_shapes=([None],[None]), padding_values=-1)
17 # Create a one-shot iterator
18 iterator = dataset.make_one_shot_iterator()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in padded_batch(self, batch_size, padded_shapes, padding_values, drop_remainder)
943 """
944 return PaddedBatchDataset(self, batch_size, padded_shapes, padding_values,
--> 945 drop_remainder)
946
947 def map(self, map_func, num_parallel_calls=None):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, input_dataset, batch_size, padded_shapes, padding_values, drop_remainder)
2526 self._padding_values = nest.map_structure_up_to(
2527 input_dataset.output_shapes, _padding_value_to_tensor, padding_values,
-> 2528 input_dataset.output_types)
2529 self._drop_remainder = ops.convert_to_tensor(
2530 drop_remainder, dtype=dtypes.bool, name="drop_remainder")

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/nest.py in map_structure_up_to(shallow_tree, func, *inputs)
465 raise ValueError("Cannot map over no sequences")
466 for input_tree in inputs:
--> 467 assert_shallow_structure(shallow_tree, input_tree)
468
469 # Flatten each input separately, apply the function to corresponding elements,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/nest.py in assert_shallow_structure(shallow_tree, input_tree, check_types)
299 raise TypeError(
300 "If shallow structure is a sequence, input must also be a sequence. "
--> 301 "Input has type: %s." % type(input_tree))
302
303 if check_types and not isinstance(input_tree, type(shallow_tree)):

TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: <class 'int'>.

问题线是
# Generate batches
dataset = dataset.padded_batch(2, padded_shapes=([None],[None]), padding_values=-1)

如果删除 padding_values,它会生成带有填充零的批次没问题
with tf.Session() as sess:
print(sess.run([i, data]))
print(sess.run([i, data]))

[array([[0],
[2]]), array([[0, 1, 2, 3],
[2, 3, 4, 0]])]
[array([[3],
[9]]), array([[3, 6, 5, 4, 3],
[3, 9, 0, 0, 0]])]

最佳答案

你应该换 padding_values .

dataset = dataset.padded_batch(2, padded_shapes=([None],[None])
, padding_values=(tf.constant(-1, dtype=tf.int64)
,tf.constant(-1, dtype=tf.int64)))
with tf.Session() as sess:
print(sess.run([i, data]))
print(sess.run([i, data]))

[array([[0],
[2]]), array([[ 0, 1, 2, 3],
[ 2, 3, 4, -1]])]
[array([[3],
[9]]), array([[ 3, 6, 5, 4, 3],
[ 3, 9, -1, -1, -1]])]

说明
padding_values 中给出的每个条目表示用于各个组件的填充值。这意味着 padded_shapes的长度应该等于 padding_values 的长度.后者用于填充这里每个数组的整个长度,前者长度相同,不需要填充 -1 。例如:
dataset = dataset.padded_batch(2, padded_shapes=([None],[None])
, padding_values=(tf.constant(-1, dtype=tf.int64)
,tf.constant(-2, dtype=tf.int64)))
with tf.Session() as sess:
print(sess.run([i, data]))
print(sess.run([i, data]))

[array([[0],
[2]]), array([[ 0, 1, 2, 3],
[ 2, 3, 4, -2]])]
[array([[3],
[9]]), array([[ 3, 6, 5, 4, 3],
[ 3, 9, -2, -2, -2]])]

关于tensorflow - 在 Tensorflow 数据集 api : How to use padded_batch so that a pads with a specific value without specifying the number of pads,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53938962/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com