gpt4 book ai didi

python - 如何检测所有用于在特定字段的表单中输入字母的框?

转载 作者:行者123 更新时间:2023-12-02 16:37:08 25 4
gpt4 key购买 nike

需要从带有每个字符输入框的表格中识别文本。

我尝试为每个输入使用边界框并裁剪该特定输入,即我可以在``名称''字段中获取所有用于输入的框。但是,当我尝试检测盒子组中的单个盒子时,我无法做到这一点,而opencv仅返回所有盒子的一个轮廓。 for循环中引用的文件是包含边界框坐标的文件。 cropped_img是属于单个字段输入(例如名称)的图像。

完整图片

这是表格的图像。

每个字段的裁剪图像

它包含许多用于输入字符的框。在此,检测到的轮廓数量始终为一。为什么我无法检测到所有单个盒子?
简而言之,我希望在croped_img中包含所有单独的框。

此外,非常感谢其他任何采用ocr形式的任务的想法!

for line in file.read().split("\n"):
if len(line)==0:
continue
region = list(map(int,line.split(' ')[:-1]))
index=line.split(' ')[-1]
text=''
contentDict={}
#uzn in format left, up, width, height
region[2] = region[0]+region[2]
region[3] = region[1]+region[3]
region = tuple(region)
cropped_img = panimg[region[1]:region[3],region[0]:region[2]]

index=index.replace('_', ' ')
if index=='sign' or index=='picture' or index=='Dec sign':
continue

kernel = np.ones((50,50),np.uint8)
gray = cv2.cvtColor(cropped_img, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
threshold = cv2.bitwise_not(threshold)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
ret, threshold = cv2.threshold(dilate,127,255,cv2.THRESH_BINARY)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
contours, hierarchy = cv2.findContours(dilate,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
contours.sort(key=lambda x:get_contour_precedence(x, panimg.shape[1]))


print("Length of contours detected: ", len(contours))
for j, ctr in enumerate(contours):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)

# Getting ROI

roi = cropped_img[y:y+h, x:x+w]
# show ROI
cv2.imshow('segment no:'+str(j-1),roi)
cv2.waitKey(0)

文件"file"的内容如下:
462 545 468 39 AO_Office
450 785 775 39 Last_Name
452 836 770 37 First_Name
451 885 772 39 Middle_Name
241 963 973 87 Abbreviation_Name

预期的输出是各个框的轮廓,以便为每个字段输入一个字母

最佳答案

我知道我参加聚会有点晚了:),但万一有人要寻找解决这个问题的方法-我最近想出了一个可以解决这个确切问题的python软件包。
我将其称为BoxDetect并通过以下方式安装:

pip install boxdetect

您可以尝试如下操作:

from boxdetect import config

config.min_w, config.max_w = (20,50)
config.min_h, config.max_h = (20,50)
config.scaling_factors = [0.4]
config.dilation_iterations = 0
config.wh_ratio_range = (0.5, 2.0)
config.group_size_range = (1, 100)
config.horizontal_max_distance_multiplier = 2


from boxdetect.pipelines import get_boxes

image_path = "dumpster/m1nda.jpg"
rects, grouped_rects, org_image, output_image = get_boxes(image_path, config, plot=False)


import matplotlib.pyplot as plt

print("======================")
print("Individual boxes (green): ", rects)
print("======================")
print("Grouped boxes (red): ", grouped_rects)
print("======================")
plt.figure(figsize=(25,25))
plt.imshow(output_image)
plt.show()

它返回所有矩形框的边界矩形坐标,形成长输入字段的分组框和表单图像上的可视化:
Processing file:  dumpster/m1nda.jpg
======================
Individual boxes (green): [[1153 1873 26 26]
[1125 1873 24 27]
[1098 1873 24 26]
...
[ 558 551 42 28]
[ 514 551 42 28]
[ 468 551 42 28]]
======================
Grouped boxes (red): [(468, 551, 457, 29), (424, 728, 47, 45), (608, 728, 31, 45), (698, 728, 33, 45), (864, 728, 31, 45), (1059, 728, 47, 45), (456, 792, 763, 29), (456, 842, 763, 28), (456, 891, 763, 29), (249, 969, 961, 28), (249, 1017, 962, 28), (700, 1064, 39, 32), (870, 1064, 41, 32), (376, 1124, 45, 45), (626, 1124, 29, 45), (750, 1124, 27, 45), (875, 1124, 41, 45), (1054, 1124, 28, 45), (507, 1188, 706, 29), (507, 1238, 706, 28), (507, 1287, 706, 29), (718, 1335, 36, 31), (856, 1335, 35, 31), (1008, 1335, 34, 32), (260, 1438, 51, 37), (344, 1438, 56, 37), (505, 1443, 98, 27), (371, 1530, 31, 31), (539, 1530, 31, 31), (486, 1636, 694, 28), (486, 1684, 694, 28), (486, 1731, 694, 29), (486, 1825, 694, 29), (486, 1873, 694, 28)]
======================

enter image description here

关于python - 如何检测所有用于在特定字段的表单中输入字母的框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56616550/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com