gpt4 book ai didi

python - Tensorflow tf.train.Saver 不保存所有变量

转载 作者:行者123 更新时间:2023-12-01 09:18:29 24 4
gpt4 key购买 nike

我认为 Tensorflow saver 将保存此处所述的所有变量

If you do not pass any arguments to tf.train.Saver(), the saver handles all variables in the graph. Each variable is saved under the name that was passed when the variable was created.

https://www.tensorflow.org/programmers_guide/saved_model

但是,下面我的代码中的变量 epochCount 似乎没有被保存。该变量用于跟踪模型在数据上训练的总时期。

当我恢复图表时,它会重置为其初始值设定项值,而不是上次保存检查点时的值。

在我看来,它只是保存计算损失时使用的变量。

这是我的代码。

这是我声明我的图表的地方:

graph = tf.Graph()

with graph.as_default():

valid_examples = np.array(random.sample(range(1, valid_window), valid_size)) #put inside graph to get new words each time

train_dataset = tf.placeholder(tf.int32, shape=[batch_size, cbow_window*2 ])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
valid_datasetSM = tf.constant(valid_examples, dtype=tf.int32)

epochCount = tf.get_variable( 'epochCount', initializer= 0) #to store epoch count to total # of epochs are known

embeddings = tf.get_variable( 'embeddings',
initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

softmax_weights = tf.get_variable( 'softmax_weights',
initializer= tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.get_variable('softmax_biases',
initializer= tf.zeros([vocabulary_size]), trainable=False )

embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is
embed_reshaped = tf.reshape( embed, [batch_size*cbow_window*2, embedding_size] )
segments= np.arange(batch_size).repeat(cbow_window*2)
averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)

loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) #Original learning rate was 1.0

norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keepdims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(
normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

saver = tf.train.Saver()

如果我从检查点恢复图表,嵌入和 softmax_biases 都会恢复,但 epochCount 会重置为其初始值设定项。 (请注意,我没有调用 tf.global_variables_initializer().run() 行,这是恢复检查点后错误重置变量的常见原因)

这是运行图表的代码

num_steps = 1000001

with tf.Session(graph=graph) as session:

saver.restore(session, './checkpointsBook2VecCbowWindow2Downloaded/bookVec.ckpt' )
average_loss = 0
saveIteration = 1
for step in range(1, num_steps):

batch_data, batch_labels = generate_batch(
batch_size, cbow_window)
feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
_, l = session.run([optimizer, loss], feed_dict=feed_dict)

if step % 20000 == 0:
recEpoch_indexA = epoch_index - recEpoch_indexA
epochCount = tf.add( epochCount, recEpoch_indexA, name=None )
recEpoch_indexA = epoch_index

save_path = saver.save(session, "checkpointsBook2VecCbowWindow2/bookVec.ckpt")
chptName = 'B2VCbowW2Embed256ckpt'+str(saveIteration)
zipfolder(chptName, 'checkpointsBook2VecCbowWindow2')
uploadModel.SetContentFile(chptName+".zip")
uploadModel.Upload()

print("Checkpoint uploaded to Google Drive")
saveIteration += 1

这是我用来打印训练后检查点中保存的所有变量的代码。我恢复图表并打印出保存的所有变量。

with tf.Session() as sess:
saver = tf.train.import_meta_graph('./MODEL/bookVec.ckpt.meta')
saver.restore(sess, './MODEL/bookVec.ckpt' )
for v in tf.get_default_graph().get_collection("variables"):
print('From variables collection ', v)

这是上面代码的输出

From variables collection  <tf.Variable 'embeddings:0' shape=(10001, 256) dtype=float32_ref>
From variables collection <tf.Variable 'softmax_weights:0' shape=(10001, 256) dtype=float32_ref>
From variables collection <tf.Variable 'softmax_biases:0' shape=(10001,) dtype=float32_ref>

正如所见,epochCount 尚未保存。

最佳答案

变量恢复为 0 的原因是因为它实际上从未更新(即它正确恢复)!您在 session 期间通过 tf.add 调用覆盖 epochCount,该调用仅返回操作,没有实际值。也就是说,变量(在 Tensorflow 意义上)是“孤立的”并且将永远保持为 0。

您可以使用tf.assign来更新变量。它可能看起来像这样:

# where you define the graph
epochCount = tf.get_variable( 'epochCount', initializer= 0)
update_epoch = tf.assign(epochCount, epochCount + 1)
...
# after you launched the session
for step in range(1, num_steps):
if step % 20000 == 0:
sess.run(update_epoch)

关于python - Tensorflow tf.train.Saver 不保存所有变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51016069/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com