gpt4 book ai didi

tensorflow - tf.parse_example 用于序列数据序列的示例

转载 作者:行者123 更新时间:2023-12-04 00:53:25 26 4
gpt4 key购买 nike

我的 Tensorflow 模型为每个示例接收一系列序列数据,即单词序列中的字符标记序列(例如,[[3]、[4,3]、[6,1,20]] ).我之前可以通过填充 3D numpy 数组 [batch_size, max_words_len, max_chars_len] 并将其输入占位符来做到这一点。

in_question_chars = tf.placeholder(tf.int32, 
[None, None, None],
name="in_question_chars")
# example of other data
in_question_words = tf.placeholder(tf.int32,
[None, None],
name="in_question_words")

但现在我想使用谷歌云机器学习引擎进行在线预测/部署。基于 Tensorflow Serving 的示例:https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py

我想到了这样的东西,但真的不知道该用什么来解析序列字符标记的序列:

serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
feature_configs = {'in_question_chars':tf.FixedLenSequenceFeature(shape=[None],
allow_missing=True,
dtype=tf.int32,
default_value=0),
'in_question_words':tf.FixedLenSequenceFeature(shape=[],
allow_missing=True,
dtype=tf.int32,
default_value=0)
}

tf_example = tf.parse_example(serialized_tf_example, feature_configs)

in_question_chars = tf.identity(tf_example['in_question_chars'],
name='in_question_chars')
# example of other data
in_question_words = tf.identity(tf_example['in_question_words'],
name='in_question_words')

我是否应该使用 VarLenFeature 将其转换为 SparseTensor(尽管它并不是真正的稀疏),然后使用 tf.sparse_tensor_to_dense 将其转换回稠密?

对于下一步,我获取每个字符标记的嵌入。

in_question_char_repres = tf.nn.embedding_lookup(char_embedding, 
in_question_chars)

所以另一种选择是将其保留为 SparseTensor,然后使用 tf.nn.embedding_lookup_sparse

我无法找到应该如何完成此操作的示例。请让我知道什么是最佳做法。谢谢!


编辑 2017 年 8 月 25 日

它似乎不允许我为 2nd 维度设置 None。

这是我的代码的精简版

def read_dataset(filename, mode=tf.contrib.learn.ModeKeys.TRAIN):  
def _input_fn():
num_epochs = MAX_EPOCHS if mode == tf.contrib.learn.ModeKeys.TRAIN else 1

input_file_names = tf.train.match_filenames_once(str(filename))

filename_queue = tf.train.string_input_producer(
input_file_names, num_epochs=num_epochs, shuffle=True)
reader = tf.TFRecordReader()
_, serialized = reader.read_up_to(filename_queue, num_records=batch_size)

features_spec = {
CORRECT_CHILD_NODE_IDX: tf.FixedLenFeature(shape=[],
dtype=tf.int64,
default_value=0),
QUESTION_LENGTHS: tf.FixedLenFeature(shape=[], dtype=tf.int64),
IN_QUESTION_WORDS: tf.FixedLenSequenceFeature(shape=[],
allow_missing=True,
dtype=tf.int64
),
QUESTION_CHAR_LENGTHS: tf.FixedLenSequenceFeature(shape=[],
allow_missing=True,
dtype=tf.int64
),
IN_QUESTION_CHARS: tf.FixedLenSequenceFeature(shape=[None],
allow_missing=True,
dtype=tf.int64
)
}
examples = tf.parse_example(serialized, features=features_spec)

label = examples[CORRECT_CHILD_NODE_IDX]
return examples, label # dict of features, label
return _input_fn

当我的形状为“无”时,它会给我这个错误:

    INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f57fc309c18>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'outputdir'}
WARNING:tensorflow:From /home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py:269: BaseMonitor.__init__ (from tensorflow.contrib.learn.python.learn.monitors) is deprecated and will be removed after 2016-12-05.
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn)
653 graph_def_version, node_def_str, input_shapes, input_tensors,
--> 654 input_tensors_as_shapes, status)
655 except errors.InvalidArgumentError as err:

/home/jupyter-admin/anaconda3/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
88 try:
---> 89 next(self.gen)
90 except StopIteration:

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:

InvalidArgumentError: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] for 'ParseExample/ParseExample' (op: 'ParseExample') with input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0].

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
<ipython-input-45-392858a0e7b4> in <module>()
48
49 shutil.rmtree('outputdir', ignore_errors=True) # start fresh each time
---> 50 learn_runner.run(experiment_fn, 'outputdir')

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in run(experiment_fn, output_dir, schedule, run_config, hparams)
207 schedule = schedule or _get_default_schedule(run_config)
208
--> 209 return _execute_schedule(experiment, schedule)
210
211

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py in _execute_schedule(experiment, schedule)
44 logging.error('Allowed values for this experiment are: %s', valid_tasks)
45 raise TypeError('Schedule references non-callable member %s' % schedule)
---> 46 return task()
47
48

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train_and_evaluate(self)
500 name=eval_dir_suffix, hooks=self._eval_hooks
501 )]
--> 502 self.train(delay_secs=0)
503
504 eval_result = self._call_evaluate(input_fn=self._eval_input_fn,

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in train(self, delay_secs)
278 return self._call_train(input_fn=self._train_input_fn,
279 max_steps=self._train_steps,
--> 280 hooks=self._train_monitors + extra_hooks)
281
282 def evaluate(self, delay_secs=None, name=None):

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py in _call_train(self, _sentinel, input_fn, steps, hooks, max_steps)
675 steps=steps,
676 max_steps=max_steps,
--> 677 monitors=hooks)
678
679 def _call_evaluate(self, _sentinel=None, # pylint: disable=invalid-name,

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)
294 'in a future version' if date is None else ('after %s' % date),
295 instructions)
--> 296 return func(*args, **kwargs)
297 return tf_decorator.make_decorator(func, new_func, 'deprecated',
298 _add_deprecated_arg_notice_to_docstring(

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in fit(self, x, y, input_fn, steps, batch_size, monitors, max_steps)
456 hooks.append(basic_session_run_hooks.StopAtStepHook(steps, max_steps))
457
--> 458 loss = self._train_model(input_fn=input_fn, hooks=hooks)
459 logging.info('Loss for final step: %s.', loss)
460 return self

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py in _train_model(self, input_fn, hooks)
954 random_seed.set_random_seed(self._config.tf_random_seed)
955 global_step = contrib_framework.create_global_step(g)
--> 956 features, labels = input_fn()
957 self._check_inputs(features, labels)
958 model_fn_ops = self._get_train_ops(features, labels)

<ipython-input-44-fdb63ed72b90> in _input_fn()
35 )
36 }
---> 37 examples = tf.parse_example(serialized, features=features_spec)
38
39 label = examples[CORRECT_CHILD_NODE_IDX]

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in parse_example(serialized, features, name, example_names)
573 outputs = _parse_example_raw(
574 serialized, example_names, sparse_keys, sparse_types, dense_keys,
--> 575 dense_types, dense_defaults, dense_shapes, name)
576 return _construct_sparse_tensors_for_sparse_features(features, outputs)
577

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/parsing_ops.py in _parse_example_raw(serialized, names, sparse_keys, sparse_types, dense_keys, dense_types, dense_defaults, dense_shapes, name)
698 dense_keys=dense_keys,
699 dense_shapes=dense_shapes,
--> 700 name=name)
701 # pylint: enable=protected-access
702

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_parsing_ops.py in _parse_example(serialized, names, sparse_keys, dense_keys, dense_defaults, sparse_types, dense_shapes, name)
174 dense_defaults=dense_defaults,
175 sparse_types=sparse_types,
--> 176 dense_shapes=dense_shapes, name=name)
177 return _ParseExampleOutput._make(result)
178

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py in apply_op(self, op_type_name, name, **keywords)
765 op = g.create_op(op_type_name, inputs, output_types, name=scope,
766 input_types=input_types, attrs=attr_protos,
--> 767 op_def=op_def)
768 if output_structure:
769 outputs = op.outputs

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device)
2630 original_op=self._default_original_op, op_def=op_def)
2631 if compute_shapes:
-> 2632 set_shapes_for_outputs(ret)
2633 self._add_op(ret)
2634 self._record_op_seen_by_control_dependencies(ret)

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op)
1909 shape_func = _call_cpp_shape_fn_and_require_op
1910
-> 1911 shapes = shape_func(op)
1912 if shapes is None:
1913 raise RuntimeError(

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in call_with_requiring(op)
1859
1860 def call_with_requiring(op):
-> 1861 return call_cpp_shape_fn(op, require_shape_fn=True)
1862
1863 _call_cpp_shape_fn_and_require_op = call_with_requiring

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in call_cpp_shape_fn(op, require_shape_fn)
593 res = _call_cpp_shape_fn_impl(op, input_tensors_needed,
594 input_tensors_as_shapes_needed,
--> 595 require_shape_fn)
596 if not isinstance(res, dict):
597 # Handles the case where _call_cpp_shape_fn_impl calls unknown_shape(op).

/home/jupyter-admin/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py in _call_cpp_shape_fn_impl(op, input_tensors_needed, input_tensors_as_shapes_needed, require_shape_fn)
657 missing_shape_fn = True
658 else:
--> 659 raise ValueError(err.message)
660
661 if missing_shape_fn:

ValueError: dense_shapes[2] has unknown rank or unknown inner dimensions: [?,?] for 'ParseExample/ParseExample' (op: 'ParseExample') with input shapes: [?], [0], [], [], [], [], [], [], [], [], [], [0], [1], [], [], [0], [], [0], [0], [0].

目前,我通过将第二维设置为 max_char_length 然后将其连接到一维数组,将二维序列转换为一维序列来解决这个问题。所以我只保留第一个 max_char_length 字符,如果它比 max_char_length 长,或者如果它更短,我用零填充它。这似乎可行,但也许有一种方法可以接受第二维的可变长度序列并在 tf.parse_example 或 tf.train.batch 中进行填充。

最佳答案

编辑:修复了令人困惑/错误的答案 =)

所以你想要的是 tf.SequenceExample使用 tf.parse_single_sequence_example而不是 tf.parse_example。这允许您让示例中 feature_list 中的每个功能成为序列的一部分,在这种情况下,每个 Feature 都可以是 VarLenFeature 代表单词中的字符数。不幸的是,当您想要传递多个 句子时,这并不适用。所以我们必须对高阶函数和 tf.sparse_concat 进行一些修改:

我在这里制作了一个测试程序:https://gist.github.com/elibixby/1c7a2497f96a457130241c59c676ebd4

输入(在序列化为一批 SequenceExamples 之前)如下所示:

[[[5, 10], [5, 10, 20]],
[[0, 1, 2], [2, 1, 0], [0, 1, 2, 3]]]

生成的 SparseTensor 如下所示:

SparseTensorValue(indices=array([[[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 1, 0],
[1, 1, 1],
[1, 1, 2],
[1, 2, 0],
[1, 2, 1],
[1, 2, 2],
[1, 2, 3]]]), values=array([[ 5, 10, 5, 10, 20, 0, 1, 2, 2, 1, 0, 0, 1, 2, 3]]), dense_shape=array([[2, 3, 4]]))

这似乎是一个 SparseTensor,其中 index=[sentence, word, letter]

关于tensorflow - tf.parse_example 用于序列数据序列的示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45783533/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com