gpt4 book ai didi

python - TensorFlow - `keys` 或 `default_value` 与表数据类型不匹配

转载 作者:太空宇宙 更新时间:2023-11-04 04:51:28 26 4
gpt4 key购买 nike

(python、机器学习和 TensorFlow 的完全新手)

我正在尝试调整 TensorFlow Linear Model Tutorial从他们的官方文档到 Abalone dataset在 ICU 机器学习库中有特色。目的是根据其他给定数据猜测鲍鱼的年轮(年龄)。

当运行下面的程序时,我得到以下信息:

File "/home/lawrence/tensorflow3.5/lib/python3.5/site-packages/tensorflow             /python/ops/lookup_ops.py", line 220, in lookup
(self._key_dtype, keys.dtype))
TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int32'>.

错误在 lookup_ops.py 的第 220 行被抛出,并被记录为在以下情况下被抛出:

    Raises:
TypeError: when `keys` or `default_value` doesn't match the table data types.

从调试 parse_csv() 看来,所有张量都是用正确的类型创建的。

您能解释一下哪里出了问题吗?我相信我正在遵循教程代码逻辑,但无法解决这个问题。

源代码:

import tensorflow as tf
import shutil

_CSV_COLUMNS = [
'sex', 'length', 'diameter', 'height', 'whole_weight',
'shucked_weight', 'viscera_weight', 'shell_weight', 'rings'
]

_CSV_COLUMN_DEFAULTS = [['M'], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0]]

_NUM_EXAMPLES = {
'train': 3000,
'validation': 1177,
}

def build_model_columns():
"""Builds a set of wide feature columns."""
# Continuous columns
sex = tf.feature_column.categorical_column_with_hash_bucket('sex', hash_bucket_size=1000)
length = tf.feature_column.numeric_column('length', dtype=tf.float32)
diameter = tf.feature_column.numeric_column('diameter', dtype=tf.float32)
height = tf.feature_column.numeric_column('height', dtype=tf.float32)
whole_weight = tf.feature_column.numeric_column('whole_weight', dtype=tf.float32)
shucked_weight = tf.feature_column.numeric_column('shucked_weight', dtype=tf.float32)
viscera_weight = tf.feature_column.numeric_column('viscera_weight', dtype=tf.float32)
shell_weight = tf.feature_column.numeric_column('shell_weight', dtype=tf.float32)

base_columns = [sex, length, diameter, height, whole_weight,
shucked_weight, viscera_weight, shell_weight]

return base_columns

def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()

return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
label_vocabulary=_CSV_COLUMNS)


def input_fn(data_file, num_epochs, shuffle, batch_size):
"""Generate an input function for the Estimator."""
assert tf.gfile.Exists(data_file), (
'%s not found. Please make sure you have either run data_download.py or '
'set both arguments --train_data and --test_data.' % data_file)

def parse_csv(value):
print('Parsing', data_file)
columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
features = dict(zip(_CSV_COLUMNS, columns))
labels = features.pop('rings')

return features, labels

# Extract lines from input files using the Dataset API.
dataset = tf.data.TextLineDataset(data_file)

if shuffle:
dataset = dataset.shuffle(buffer_size=_NUM_EXAMPLES['train'])

dataset = dataset.map(parse_csv)

# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
dataset = dataset.batch(batch_size)

iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()

return features, labels

def main(unused_argv):
# Clean up the model directory if present
shutil.rmtree("/home/lawrence/models/albones/", ignore_errors=True)
model = build_estimator()

# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
for n in range(40 // 2):
model.train(input_fn=lambda: input_fn(
"/home/lawrence/abalone.data", 2, True, 40))

results = model.evaluate(input_fn=lambda: input_fn(
"/home/lawrence/abalone.data", 1, False, 40))

# Display evaluation metrics
print('Results at epoch', (n + 1) * 2)
print('-' * 60)

for key in sorted(results):
print('%s: %s' % (key, results[key]))


if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run(main=main)

这是来自 abalone.names 的数据集列的分类:

Name            Data Type   Meas.   Description
---- --------- ----- -----------
Sex nominal M, F, [or] I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years

数据集条目按此顺序显示为常用分隔值,新条目换行。

最佳答案

您几乎做对了所有事情。问题在于估算器的定义。

任务是预测 Rings 列,它是一个整数,所以它看起来像一个回归 问题。但是你决定做一个分类任务,这也是有效的:

def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()

return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
label_vocabulary=_CSV_COLUMNS)

默认情况下,tf.estimator.LinearClassifier假设二进制分类,即 n_classes=2。在您的情况下,这显然不是真的 - 这是第一个错误。您还设置了 label_vocabulary,tensorflow 将其解释为标签列中的一组可能值。这就是为什么它需要 tf.string dtype。由于 Rings 是一个整数,您根本不需要 label_vocabulary

将它们组合在一起:

def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()

return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
n_classes=30)

我建议你也试试 tf.estimator.LinearRegressor ,这可能会更准确。

关于python - TensorFlow - `keys` 或 `default_value` 与表数据类型不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48206320/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com