gpt4 book ai didi

python - 在图像中应用智能阈值的方法

转载 作者:行者123 更新时间:2023-12-04 07:16:51 25 4
gpt4 key购买 nike

我正在编写一个 OCR 应用程序(用于希伯来语脚本)。
应用程序的第一部分是阈值,
这是我的原始图像的样子:
original image
这是阈值处理后的样子:
after thresholding
正如你所看到的,它大部分都很好,但是字母上的“皇冠”或“装饰”有时会像这个词一样消失:
pnei original
那变成:
pnei threshold
问题是,在我对原始图像应用 RGB2GRAY 后,黑色的皇冠真的不够暗,因此在阈值处理过程中它们变白了,但是很容易看出它“应该”是黑色的,问题是如何我应该告诉算法检测它吗...
我当前的阈值代码使用 otzu + 局部阈值,这是代码:

def apply_threshold(img, is_cropped=False):
'''
this function applies a threshold on the image,
the first is Otsu TH on all the image, and afterwards an adaptive TH,
based on the size of the image.
I apply a logical OR between all the THs, becasue my assumption is that a letter will always be black,
while the background can sometimes be black and sometimes white -
thus I need to apply OR to have the background white.
'''
if len(np.unique(img)) == 2: # img is already binary
# return img
gray_img = rgb2gray(img)
_, binary_img = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return binary_img
gray_img = rgb2gray(img)
_, binary_img = cv2.threshold(gray_img.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
connectivity = 8
output_stats = cv2.connectedComponentsWithStats(binary_img.max() - binary_img, connectivity, cv2.CV_32S)
df = pd.DataFrame(output_stats[2], columns=['left', 'top', 'width', 'height', 'area'])[1:]
if df['area'].max() / df['area'].sum() > 0.1 and is_cropped and False:
binary_copy = gray_img.copy()
gray_img_max = gray_img[np.where(output_stats[1] == df['area'].argmax())]
TH1, _ = cv2.threshold(gray_img_max.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# curr_img = binary_copy[np.where(output_stats[1] == df['area'].argmax())]
binary_copy[np.where((output_stats[1] == df['area'].argmax()) & (gray_img > TH1))] = 255
binary_copy[np.where((output_stats[1] == df['area'].argmax()) & (gray_img <= TH1))] = 0

gray_img_not_max = gray_img[np.where(output_stats[1] != df['area'].argmax())]
TH2, _ = cv2.threshold(gray_img_not_max.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
binary_copy[np.where((output_stats[1] != df['area'].argmax()) & (gray_img > TH2))] = 255
binary_copy[np.where((output_stats[1] != df['area'].argmax()) & (gray_img <= TH2))] = 0
binary_img = binary_copy.copy()
# N = [3, 5, 7, 9, 11, 13,27, 45] # sizes to divide the image shape in
# N = [20,85]
N = [3, 5, 25]
min_dim = min(binary_img.shape)
for n in N:
block_size = int(min_dim / n)
if block_size % 2 == 0:
block_size += 1 # block_size needs to be odd
binary_img = binary_img | cv2.adaptiveThreshold(gray_img.astype('uint8'), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, block_size, 10)


return binary_img
任何创意将不胜感激!

最佳答案

一种方法是 Python/OpenCV 中的除法归一化。
输入:
enter image description here

import cv2
import numpy as np

# load image
img = cv2.imread("hebrew_text.jpg")

# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# blur
blur = cv2.GaussianBlur(gray, (99,99), 0)

# divide
divide = cv2.divide(gray, blur, scale=255)

# write result to disk
cv2.imwrite("hebrew_text_division.png", divide)

# display it
#cv2.imshow("thresh", thresh)
cv2.imshow("gray", gray)
cv2.imshow("divide", divide)
cv2.waitKey(0)
cv2.destroyAllWindows()
结果:
enter image description here
执行此操作后,您可能需要设置阈值,然后通过获取轮廓并丢弃面积小于最小重音标记大小的任何轮廓来清理它。
如果可能,我还建议将您的图像保存为 PNG 而不是 JPG。 JPG 具有有损压缩并引入了颜色变化。这可能是您在背景中遇到无关标记的一些问题的根源。

关于python - 在图像中应用智能阈值的方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68714927/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com