gpt4 book ai didi

python-3.x - 深度学习 Keras 模型 CTC_Loss 给出 loss = infinity

转载 作者:行者123 更新时间:2023-12-05 06:30:29 24 4
gpt4 key购买 nike

我有一个用于文本识别的 CRNN 模型,它发布在 Github 上,接受过英语培训,

现在我正在使用这个算法做同样的事情,但阿拉伯语。

我的 ctc 函数是:

def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

我的模型是:

def get_Model(training):

img_w = 128
img_h = 64


# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 128


if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
# Initialising the CNN
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)

conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)
inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)

# cuts down input size going into RNN:
inner = Dense(time_dense_size, activation=act, name='dense1')(inner)

# Two layers of bidirectional GRUs
# GRU seems to work as well, if not better than LSTM:
gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)
gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)
gru1_merged = add([gru_1, gru_1b])
gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)

# transforms RNN output to character activations:
inner = Dense(num_classes+1, kernel_initializer='he_normal',
name='dense2')(concatenate([gru_2, gru_2b]))
y_pred = Activation('softmax', name='softmax')(inner)
Model(inputs=input_data, outputs=y_pred).summary()

labels = Input(name='the_labels', shape=[30], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])

# clipnorm seems to speeds up convergence



# the loss calc occurs elsewhere, so use a dummy lambda func for the loss

if training:
return Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)

return Model(inputs=[input_data], outputs=y_pred)

然后我用 SGD 优化器编译它(试过 SGD,adam)

sgd = SGD(lr=0.0000002, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)

然后我将模型与我的训练集(最多 30 个字符的单词图像)拟合到(30 个标签序列

model.fit_generator(generator=tiger_train.next_batch(),
steps_per_epoch=int(tiger_train.n / batch_size),
epochs=30,
callbacks=[checkpoint],
validation_data=tiger_val.next_batch(),
validation_steps=int(tiger_val.n / val_batch_size))

一启动就给我loss = inf,找了很多遍都没发现类似的问题。

所以我的问题是,我该如何解决这个问题,什么可以使 ctc_loss 计算无限成本?

提前致谢

最佳答案

我找到问题了,是尺寸问题,

对于使用 CTC 层R-CNN OCR,如果您正在检测长度为 n 的序列,您应该有一个图像宽度至少为 (2*n-1)。越多越好,直到达到最佳图像/时间步比,让 CTC 层 能够正确识别字母。如果图像小于 (2*n-1),它会给出一个 nan 损失。

关于python-3.x - 深度学习 Keras 模型 CTC_Loss 给出 loss = infinity,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52283000/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com