gpt4 book ai didi

python - 如何用python对扫描页面进行逐字切片?

转载 作者:太空狗 更新时间:2023-10-30 02:50:00 35 4
gpt4 key购买 nike

有什么方法可以将文本的扫描图像分割成多个图像,每个图像包含一个单词?即,如果我们扫描包含“n”个词的页面,那么脚本应该生成“n”个单独的图像。

(使用 python )

最佳答案

这不是我非常熟悉的领域,但假设您无法使用 OCR (因为您的文字难以辨认或其他原因),我会(可能天真地)尝试类似的方法:

  • 将图像数据加载到内存中
  • 将像素数据拆分为图像的行
  • 找到所有只有白色像素的“行”:将这些标记为“白色行”
  • 为每个“白行”中的每个“列”尝试找到白色间隙
  • 采用所有新的 x,y 坐标并剪切图像。

实际上,这听起来像是一个有趣的练习,所以我试了一下 pyPNG模块:

import png
import sys

KERNING = 3

def find_rows(pixels,width, height):
"find all rows that are purely white"
white_rows = []
is_white = False
for y in range(height):
if sum(sum( pixels[(y*4*width)+x*4+p] for p in range(3)) for x in range(width)) >= width*3*254:
if not is_white:
white_rows.append(y)
is_white = True
else:
is_white = False
return white_rows

def find_words_in_image(blob, tolerance=30):
n = 0
r = png.Reader(bytes=blob)
(width,height,pixels_rows,meta) = r.asRGBA8()
pixels = []
for row in pixels_rows:
for px in row:
pixels.append(px)
# find each horizontal line
white_rows = find_rows(pixels,width,height)
# for each line try to find a white vertical gap
for i,y in enumerate(white_rows):
if y >= len(white_rows):
continue
y2 = white_rows[i+1]
height_of_row = y2 - y
is_white = False
white_cols = []
last_black = -100
for x in range(width-4):
s = y*4*width+x*4
if sum(pixels[s+y3*4*width] + pixels[s+y3*4*width+1] + pixels[s+y3*4*width+2] for y3 in range(height_of_row)) >= height_of_row*3*240:
if not is_white:
if len(white_cols)>0 and x-last_black < KERNING:
continue
white_cols.append(x)
is_white = True
else:
is_white = False
last_black = x
# now we have a list of x,y co-oords for all the words on this row
for j,x in enumerate(white_cols):
if j >= len(white_cols)-1:
continue
wordpx = []
new_width = white_cols[j+1]-x
new_height = y2-y
x_offset = x*4
for h in range(new_height):
y_offset = (y+h)*4*width
start = x_offset+y_offset
wordpx.append( pixels[start:start+(new_width*4)] )
n += 1
with open('word%s.png' % n, 'w') as f:
w = png.Writer(
width=new_width,
height=new_height,
alpha=True
)
w.write(f,wordpx)
return n



if __name__ == "__main__":
#
# USAGE: python png2words.py yourpic.png
#
# OUTPUT: [word1.png...word2.png...wordN.png]
#
n = find_words_in_image( open(sys.argv[1]).read() )
print "found %s words" % n

关于python - 如何用python对扫描页面进行逐字切片?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5251066/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com