gpt4 book ai didi

Python机器学习数字识别

转载 作者:行者123 更新时间:2023-11-30 09:33:46 25 4
gpt4 key购买 nike

我正在按照此站点上的代码进行操作:

https://blog.luisfred.com.br/reconhecimento-de-escrita-manual-com-redes-neurais-convolucionais/

以下是该网站遍历的代码:

from keras. datasets import mnist
from keras. models import Sequential
from keras. layers import Dense
from keras. layers import Dropout
from keras. layers import Flatten
import numpy as np
from matplotlib import pyplot as plt
from keras. layers . convolutional import Conv2D
from keras. layers . convolutional import MaxPooling2D
from keras. utils import np_utils
from keras import backend as K
K . set_image_dim_ordering ( 'th' )
import cv2
import matplotlib. pyplot as plt
#% inline matplotlib # If you are using Jupyter, it will be useful for plotting graphics or figures inside cells

#Divided the data into subsets of training and testing.
( X_train , y_train ) , ( X_test , y_test ) = mnist. load_data ( )
# Since we are working in gray scale we can
# set the depth to the value 1.
X_train = X_train . reshape ( X_train . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
X_test = X_test . reshape ( X_test . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
# We normalize our data according to the
# gray scale. The floating point values ​​are in the range [0,1], instead of [.255]
X_train = X_train / 255
X_test = X_test / 255
# Converts y_train and y_test, which are class vectors, to a binary class array (one-hot vectors)
y_train = np_utils. to_categorical ( y_train )
y_test = np_utils. to_categorical ( y_test )
# Number of digit types found in MNIST. In this case, the value is 10, corresponding to (0,1,2,3,4,5,6,7,8,9).
num_classes = y_test. shape [ 1 ]


def deeper_cnn_model ( ) :
model = Sequential ( )
# Convolution2D will be our input layer. We can observe that it has
# 30 feature maps with size of 5 × 5 and an activation function of type ReLU.
model.add ( Conv2D ( 30 , ( 5 , 5 ) , input_shape = ( 1 , 28 , 28 ) , activation = 'relu' ) )
# The MaxPooling2D layer will be our second layer where we will have a sample window of size 2 x 2
model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )
# A new convolutional layer, with 15 feature maps of size 3 × 3, and activation function ReLU
model.add ( Conv2D ( 15 , ( 3 , 3 ) , activation = 'relu' ) )
# A new subsampling with a 2x2 dimension pooling.
model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )

# We include a dropout with a 20% probability (you can try other values)
model.add ( Dropout ( 0.2 ) )
# We need to convert the output of the convolutional layer, so that it can be used as input to the densely connected layer that is next.
# What this does is "flatten / flatten" the structure of the output of the convolutional layers, creating a single long vector of features
# that will be used by the Fully Connected layer.
model.add ( Flatten ( ) )
# Fully connected layer with 128 neurons.
model.add ( Dense ( 128 , activation = 'relu' ) )
# Followed by a new fully connected layer with 64 neurons
model.add ( Dense ( 64 , activation = 'relu' ) )

# Followed by a new fully connected layer with 32 neurons
model.add ( Dense ( 32 , activation = 'relu' ) )
# The output layer has the number of neurons compatible with the
# number of classes to be obtained. Notice that we are using a softmax activation function,
model.add ( Dense ( num_classes, activation = 'softmax' , name = 'preds' ) )
# Configure the entire training process of the neural network
model.compile ( loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = [ 'accuracy' ] )

return model


model = deeper_cnn_model ( )
model.summary ( )
model.fit ( X_train , y_train, validation_data = ( X_test , y_test ) , epochs = 10 , batch_size = 200 )
scores = model. evaluate ( X_test , y_test, verbose = 0 )
print ( "\ nacc:% .2f %%" % (scores [1] * 100))


###enhance to check multiple numbers after the training is done

img_pred = cv2. imread ( 'five.JPG' , 0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

最后,我尝试对我绘制和导入的数字 5 进行预测(我也尝试过其他手绘数字,但结果同样不佳):

img_pred = cv2. imread ( 'five.JPG' ,   0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

这是 5.jpg:

hand drawn five image

但是当我输入自己的数字时,模型预测错误。对于为什么会这样有什么想法吗?我承认我是机器学习新手,刚刚开始涉足它。我的想法是图像的居中或图像的标准化可能已关闭?非常感谢任何帮助!

编辑1:

MNIST 测试编号将如下所示:

white numbers black backgrounds

最佳答案

您似乎有两个问题,正如您怀疑的那样,这两个问题与数据的预处理有关。

第一个是你的图像相对于训练数据是倒置的:

  • 使用 img_pred = cv2 读取 .jpg 的一个 channel 后。 imread ( ' Five.JPG' , 0 ),背景像素接近白色,值在 215-238 附近。
  • 如果查看 X_train 中的训练数据,背景像素全部为零,数字为白色或接近白色(上部 210-255)。

尝试在 X_train 中的一些选择旁边绘制图像,您会看到它们是倒置的。

另一个问题是 cv2.resize() 中的默认插值不会保留数据的缩放比例。调整数据大小后,最小值会跳至 60,而不是 0。比较 img.pred.min()img.pred.max() 的值> 重新调整步骤之前和之后。

您可以使用如下函数反转和缩放数据,使其看起来更像 MNIST 输入数据:

 def mnist_bytescale(image):
# Use float for rescaling
img_temp = image.astype(np.float32)
#Re-zero the data
img_temp -= img_temp.min()
#Re-scale and invert
img_temp /= (img_temp.max()-img_temp.min())
img_temp *= 255
return 255 - img_temp.astype('uint')

这将翻转您的数据,并将其从 0 线性缩放到 255,就像网络正在训练的数据一样。但是,如果您绘制 mnist_bytescale(img_pred),您会注意到大多数像素中的背景级别仍然不完全为 0,因为原始图像的背景级别不是恒定的(可能是由于 JPEG 压缩) 。)如果您的网络仍然存在此翻转和缩放数据的问题,您可以尝试使用 np.clip将背景水平清零,看看是否有帮助。

关于Python机器学习数字识别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49741671/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com