gpt4 book ai didi

python - 如何识别带有彩色背景图像的文本?

转载 作者:太空宇宙 更新时间:2023-11-03 21:39:12 25 4
gpt4 key购买 nike

我是 opencv 和 python 以及 tesseract 的新手。现在,我正在创建一个脚本来识别图像中的文本。我的代码适用于黑色文本和白色背景或黑色背景的白色文本,但不适用于彩色图像。例如,带有蓝色背景的白色文本,例如按钮。字体也会影响这个吗?在这种情况下,我找到了 Reboot 文本(按钮)

这是示例图片

我尝试了一堆通过opencv进行图像预处理的代码和方法,但没有得到结果。图像二值化、降噪、灰度都不好。

这是示例代码:

from PIL import Image
import pytesseract
import cv2
import numpy as np

# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')

filename = 'image.png'
outputname = 'converted.png'

# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)

# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)

# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)

channel_height, channel_width, _ = image_channels[0].shape

# apply Otsu threshold to each channel
for i in range(0, 3):
_, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))

# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)

# save the denoised image
cv2.imwrite(outputname, image_channels)

image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)

通过运行代码,我得到了这张图片:[![在此处输入图片描述][1]][1]

以及带 TSV 参数的 tesseract 结果:

level   page_num    block_num   par_num line_num    word_num    left    top width   height  conf    text
1 1 0 0 0 0 0 0 1024 768 -1
2 1 1 0 0 0 2 13 1002 624 -1
3 1 1 1 0 0 2 13 1002 624 -1
4 1 1 1 1 0 172 13 832 22 -1
5 1 1 1 1 1 172 13 127 22 84 CONFIGURATION
5 1 1 1 1 2 822 17 59 11 92 CENTOS
5 1 1 1 1 3 887 17 7 11 95 7
5 1 1 1 1 4 900 17 104 11 95 INSTALLATION
4 1 1 1 2 0 86 29 900 51 -1
5 1 1 1 2 1 86 35 15 45 12 4
5 1 1 1 2 2 825 30 27 40 50 Bes
5 1 1 1 2 3 952 29 34 40 51 Hel
4 1 1 1 3 0 34 91 87 17 -1
5 1 1 1 3 1 34 91 87 17 90 CentOS
4 1 1 1 4 0 2 116 9 8 -1
5 1 1 1 4 1 2 116 9 8 0 ‘
4 1 1 1 5 0 184 573 57 14 -1
5 1 1 1 5 1 184 573 57 14 90 Complete!
4 1 1 1 6 0 634 606 358 14 -1
5 1 1 1 6 1 634 606 43 10 89 CentOS
5 1 1 1 6 2 683 609 7 7 96 is
5 1 1 1 6 3 696 609 24 7 96 now
5 1 1 1 6 4 725 606 67 14 96 successfully
5 1 1 1 6 5 797 606 45 10 96 installed
5 1 1 1 6 6 848 606 18 10 96 and
5 1 1 1 6 7 872 599 29 25 96 ready
5 1 1 1 6 8 906 599 15 25 95 for
5 1 1 1 6 9 928 609 20 11 96 you
5 1 1 1 6 10 953 608 12 8 96 to
5 1 1 1 6 11 971 606 21 10 95 use!
4 1 1 1 7 0 775 623 217 14 -1
5 1 1 1 7 1 775 623 15 10 95 Go
5 1 1 1 7 2 796 623 31 10 96 ahead
5 1 1 1 7 3 833 623 18 10 96 and
5 1 1 1 7 4 857 623 38 10 96 reboot
5 1 1 1 7 5 900 625 12 8 96 to
5 1 1 1 7 6 918 625 25 8 95 start
5 1 1 1 7 7 949 626 28 11 96 using
5 1 1 1 7 8 983 623 9 10 93 it!

如您所见,“重启”文本未显示。也许是因为字体?还是颜色?

最佳答案

这里有两种不同的方法:

<强>1。传统图像处理和轮廓滤波

主要思想是提取 ROI,然后应用 Tesseract OCR。

  • 将图像转换为灰度和高斯模糊
  • 自适应阈值
  • 寻找轮廓
  • 遍历轮廓并使用轮廓近似和面积进行过滤
  • 提取投资返回率

一旦我们从自适应阈值化获得二值图像,我们就会找到轮廓并使用 cv2.arcLength()cv2.approxPolyDP() 的轮廓近似进行过滤。如果轮廓有四个点,我们假设它是矩形或正方形。此外,我们使用轮廓区域应用第二个过滤器,以确保我们隔离正确的 ROI。这是提取的投资返回率

enter image description here

import cv2

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
if len(approx) == 4 and area > 2200:
x,y,w,h = cv2.boundingRect(approx)
ROI = image[y:y+h, x:x+w]
cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
ROI_number += 1

现在我们可以将其放入 Pytesseract 中。注意 Pytesseract 要求图像文本为黑色而背景为白色,因此我们首先进行一些预处理。这是 Pytesseract 的预处理图像和结果

enter image description here

Reboot

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

result = 255 - thresh

data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

通常,您还需要使用形态学变换来平滑图像,但对于这种情况,文本就足够了

<强>2。颜色阈值

第二种方法是使用具有下限和上限 HSV 阈值的颜色阈值来创建一个掩码,我们可以在其中提取 ROI。看here一个完整的例子。提取 ROI 后,我们按照相同的步骤对图像进行预处理,然后再将其放入 Pytesseract

关于python - 如何识别带有彩色背景图像的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58327383/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com