gpt4 book ai didi

python - Tensorflow Estimator : loss not decreasing when using tf. feature_column.embedding_column 用于分类变量列表

转载 作者:行者123 更新时间:2023-11-30 09:17:12 24 4
gpt4 key购买 nike

我对 Tensorflow Estimator 非常陌生。我想知道是否可以将一组分类变量作为特征传递给估计器,并自动将其转换为一组嵌入。例如,以下是 CSV 文件中的一条记录。它包含 2 个分类变量列表(用方括号括起来)“country”和“watch”,2 个分类变量“day_of_week”和“day_period”以及一个目标,在本例中为“movie_id”。

day_of_week,day_period,country,movie_id,watched
SUNDAY,EVENING,[USA,UK],B2JO1owWbeLn,[WGdZ5qZmLw0,abcdef]
MONDAY,EVENING,[China],xxx,[abc,def,ijk]

根据文档https://www.tensorflow.org/api_docs/python/tf/feature_column 、“day_of_week”和“day_period”可以表示为“categorical_column_with_vocabulary_list”。这很简单。然而,“国家”和“观看的”是分类变量的列表。我想将列表中的每个分类变量合并到嵌入中。在同一个文档中,“tf.feature_column.embedding_column”就可以解决问题。

以下函数构建表示上述输入的列。

def build_model_columns():
day_of_week = tf.feature_column.categorical_column_with_vocabulary_list('day_of_week', day_of_weeks)
day_period = tf.feature_column.categorical_column_with_vocabulary_list('day_period', day_periods)
country = tf.feature_column.categorical_column_with_vocabulary_list('country', countries)
watched = tf.feature_column.categorical_column_with_vocabulary_list('watched', movie_emb_ids)

columns = [
tf.feature_column.indicator_column(day_of_week),
tf.feature_column.indicator_column(day_period),
tf.feature_column.embedding_column(country, 8),
tf.feature_column.embedding_column(watched, 32)
]
return columns

以下是生成训练数据集的函数

def tensor_to_array(tensor):
length = tf.size(tf.string_split([tensor], ""))
sub = tf.substr(tensor, 1, length-2) # remove the leading '[' and trailing ']'
splits = tf.string_split([sub], delimiter=',')
return splits

def train_input_fn():
train_files = "train.csv"
target_files = "target.csv"
target_table, target_ids = read_table_lookup(target_files, "movie")

def preprocess(day_of_week, day_period, country, movie_id, watched):

features = {
'day_of_week': day_of_week,
'day_period': day_period,
'country': tensor_to_array(country),
'watched': tensor_to_array(watched)

}
# target_table is a lookup table converting "movie_id" to integer "id"
return features, target_table.lookup(movie_id)

dataset = (tf.contrib.data.CsvDataset(train_files, record_defaults, header=True)
.map(preprocess, num_parallel_calls=5)
.batch(batch_size=batch_size, drop_remainder=False)
.repeat()
)

# iterator = dataset.make_initializable_iterator()
# tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)

return dataset

以下是用于创建和训练估计器的代码片段。

hidden_units = [512, 512]
record_defaults = [[""]] * 5
columns = build_model_columns()
estimator = tf.estimator.DNNClassifier(model_dir="dir",
feature_columns=columns,
hidden_units=hidden_units,
n_classes=len(target_ids)) # length of all targets

estimator.train(input_fn=train_input_fn)

我没有收到任何错误,似乎一切都应该按预期工作,但训练损失是如此之大,并且在 3,xxx 左右波动,并且永远不会减少。见下文

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /home/cocoza4/movie_models/deep/model.ckpt.
INFO:tensorflow:loss = 6538.0645, step = 0
INFO:tensorflow:global_step/sec: 17.353
INFO:tensorflow:loss = 3596.562, step = 100 (5.764 sec)
INFO:tensorflow:global_step/sec: 17.434
INFO:tensorflow:loss = 3504.936, step = 200 (5.736 sec)
INFO:tensorflow:global_step/sec: 17.4234
INFO:tensorflow:loss = 3500.0488, step = 300 (5.739 sec)
INFO:tensorflow:global_step/sec: 17.5321
INFO:tensorflow:loss = 3480.702, step = 400 (5.705 sec)
INFO:tensorflow:global_step/sec: 17.4534
INFO:tensorflow:loss = 3517.599, step = 500 (5.729 sec)
INFO:tensorflow:global_step/sec: 17.3421
INFO:tensorflow:loss = 3446.142, step = 600 (5.769 sec)
INFO:tensorflow:global_step/sec: 17.313
INFO:tensorflow:loss = 3281.3088, step = 700 (5.776 sec)
INFO:tensorflow:global_step/sec: 17.4421
INFO:tensorflow:loss = 3326.7336, step = 800 (5.731 sec)
INFO:tensorflow:global_step/sec: 17.3619
INFO:tensorflow:loss = 3464.902, step = 900 (5.762 sec)
INFO:tensorflow:global_step/sec: 17.2013
INFO:tensorflow:loss = 3364.2153, step = 1000 (5.813 sec)
INFO:tensorflow:global_step/sec: 17.4429
INFO:tensorflow:loss = 3410.449, step = 1100 (5.734 sec)
INFO:tensorflow:global_step/sec: 17.0483
INFO:tensorflow:loss = 3351.018, step = 1200 (5.866 sec)
INFO:tensorflow:global_step/sec: 17.4214
INFO:tensorflow:loss = 3386.995, step = 1300 (5.740 sec)
INFO:tensorflow:global_step/sec: 17.7965
INFO:tensorflow:loss = 3263.6074, step = 1400 (5.617 sec)
INFO:tensorflow:global_step/sec: 17.6944
INFO:tensorflow:loss = 3321.574, step = 1500 (5.652 sec)
INFO:tensorflow:global_step/sec: 17.3603
INFO:tensorflow:loss = 3234.7761, step = 1600 (5.760 sec)

我想知道我在准备训练数据时是否做错了什么?

谢谢

皮拉纳特·F.

最佳答案

引起我注意的第一件事是隐藏单元的数量。我会尝试调整你的隐藏单位。通常图层的大小应该减小,所以我会尝试 [512,256,128]

关于python - Tensorflow Estimator : loss not decreasing when using tf. feature_column.embedding_column 用于分类变量列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52591291/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com