gpt4 book ai didi

python - 为 MNIST OCR 预处理图像

转载 作者:太空狗 更新时间:2023-10-30 01:11:29 24 4
gpt4 key购买 nike

我正忙于使用 python 中的 OCR 应用程序来读取数字。我正在使用 OpenCV 查找图像上的轮廓,对其进行裁剪,然后将图像预处理为 28x28 以用于 MNIST 数据集。我的图像不是正方形的,所以当我调整图像大小时我似乎失去了很多质量。我可以尝试任何提示或建议吗?

This is the original image

This is after editing it

And this is the quality it should be

我尝试了来自 的一些技巧,如膨胀和开放。但这并没有让它变得更好,它只会让它变得模糊......


import numpy as np
import cv2
import imutils
import scipy
from imutils.perspective import four_point_transform
from scipy import ndimage

images = np.zeros((4, 784))
correct_vals = np.zeros((4, 10))

i = 0

def getBestShift(img):
cy, cx = ndimage.measurements.center_of_mass(img)

rows, cols = img.shape
shiftx = np.round(cols / 2.0 - cx).astype(int)
shifty = np.round(rows / 2.0 - cy).astype(int)

return shiftx, shifty

def shift(img, sx, sy):
rows, cols = img.shape
M = np.float32([[1, 0, sx], [0, 1, sy]])
shifted = cv2.warpAffine(img, M, (cols, rows))
return shifted

for no in [1, 3, 4, 5]:
image = cv2.imread("images/" + str(no) + ".jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 200, 255)

cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
cnts = cnts[0] if imutils.is_cv2() else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
displayCnt = None

for c in cnts:
# approximate the contour
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)

# if the contour has four vertices, then we have found
# the thermostat display
if len(approx) == 4:
displayCnt = approx

warped = four_point_transform(gray, displayCnt.reshape(4, 2))
gray = cv2.resize(255 - warped, (28, 28))
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

while np.sum(gray[0]) == 0:
gray = gray[1:]

while np.sum(gray[:, 0]) == 0:
gray = np.delete(gray, 0, 1)

while np.sum(gray[-1]) == 0:
gray = gray[:-1]

while np.sum(gray[:, -1]) == 0:
gray = np.delete(gray, -1, 1)

rows, cols = gray.shape

if rows > cols:
factor = 20.0 / rows
rows = 20
cols = int(round(cols * factor))
gray = cv2.resize(gray, (cols, rows))

factor = 20.0 / cols
cols = 20
rows = int(round(rows * factor))
gray = cv2.resize(gray, (cols, rows))

colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')

shiftx, shifty = getBestShift(gray)
shifted = shift(gray, shiftx, shifty)
gray = shifted

cv2.imwrite("processed/" + str(no) + ".png", gray)
cv2.imshow("imgs", gray)



gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)

这导致 enter image description here在完成其余处理之后。

您可以在此处查看方法比较:但由于只有少数几个,您可以全部尝试一下,看看哪个效果最好。看起来默认是 INTER_LINEAR。

关于python - 为 MNIST OCR 预处理图像,我们在Stack Overflow上找到一个类似的问题:

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号