gpt4 book ai didi

machine-learning - 使用 Tensorflow Serving 时如何存储字典并将单词映射到整数?

转载 作者:行者123 更新时间:2023-11-30 09:18:52 25 4
gpt4 key购买 nike

我在 Tensorflow 上训练了 LSTM RNN 分类模型。我正在保存和恢复检查点以重新训练并使用模型进行测试。现在我想使用 Tensorflow 服务,以便我可以在生产中使用该模型。

最初,我会解析语料库来创建字典,然后使用该字典将字符串中的单词映射到整数。然后,我会将这个字典存储在一个 pickle 文件中,该文件可以在恢复检查点并在数据集上重新训练时重新加载,或者只是为了使用模型以使映射保持一致。使用 SavedModelBuilder 保存模型时如何存储此字典?

我的神经网络代码如下。保存模型的代码即将结束(我包括上下文的整个结构的概述):

...


# Read files and store them in variables
with open('./someReview.txt', 'r') as f:
reviews = f.read()
with open('./someLabels.txt', 'r') as f:
labels = f.read()

...

#Pre-processing functions
#Parse through dataset and create a vocabulary
vocab_to_int, reviews = RnnPreprocessing.map_vocab_to_int(reviews)
with open(pickle_path, 'wb') as handle:
pickle.dump(vocab_to_int, handle, protocol=pickle.HIGHEST_PROTOCOL)

#More preprocessing functions
...


# Building the graph
lstm_size = 256
lstm_layers = 2
batch_size = 1000
learning_rate = 0.01
n_words = len(vocab_to_int) + 1

# Create the graph object
tf.reset_default_graph()
with tf.name_scope('inputs'):
inputs_ = tf.placeholder(tf.int32, [None, None], name="inputs")
labels_ = tf.placeholder(tf.int32, [None, None], name="labels")
keep_prob = tf.placeholder(tf.float32, name="keep_prob")

#Create embedding layer LSTM cell, LSTM Layers

...

# Forward pass
with tf.name_scope("RNN_forward"):
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)


# Output. We are only interested in the latest output of the lstm cell
with tf.name_scope('predictions'):
predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
tf.summary.histogram('predictions', predictions)
#More functions for cost, accuracy, optimizer initialization

...

# Training
epochs = 1
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(epochs):
state = sess.run(initial_state)

for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
summary, loss, state, _ = sess.run([merged, cost, final_state, optimizer], feed_dict=feed)

train_writer.add_summary(summary, iteration)

if iteration%1==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss))

if iteration%2==0:
val_acc = []
val_state = sess.run(cell.zero_state(batch_size, tf.float32))
for x, y in get_batches(val_x, val_y, batch_size):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: val_state}
summary, batch_acc, val_state = sess.run([merged, accuracy, final_state], feed_dict=feed)
val_acc.append(batch_acc)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
test_writer.add_summary(summary, iteration)



#Saving the model
export_path = './SavedModel'
print ('Exporting trained model to %s'%(export_path))

builder = saved_model_builder.SavedModelBuilder(export_path)

# Build the signature_def_map.
classification_inputs = utils.build_tensor_info(inputs_)
classification_outputs_classes = utils.build_tensor_info(labels_)

classification_signature = signature_def_utils.build_signature_def(
inputs={signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={
signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
},
method_name=signature_constants.CLASSIFY_METHOD_NAME)


legacy_init_op = tf.group(
tf.tables_initializer(), name='legacy_init_op')
#add the sigs to the servable
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature
},
legacy_init_op=legacy_init_op)
print ("added meta graph and variables")

#save it!
builder.save()
print("model saved")

我不完全确定这是否是保存此类模型的正确方法,但这是我在文档和在线教程中找到的唯一实现。

我在文档中没有找到任何示例或任何明确的指南来保存字典或如何在恢复保存的模型时使用它。

使用检查点时,我只需在运行 session 之前加载 pickle 文件。如何恢复这个保存的模型,以便我可以使用字典使用相同的单词到 int 映射?我应该有什么特定的方法来保存或加载模型吗?

我还添加了inputs_作为输入签名的输入。这是单词被映射后的整数序列。我无法指定字符串作为输入,因为我收到 AttributeError: 'str' object has no attribute 'dtype' 。在这种情况下,单词到底是如何映射到生产模型中的整数的?

最佳答案

使用tf.feature_column中的实用程序实现预处理,并且在服务中使用与整数相同的映射将非常简单。

关于machine-learning - 使用 Tensorflow Serving 时如何存储字典并将单词映射到整数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47399201/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com