python - 合并 MSER 中的区域以识别 OCR 中的文本行-6ren

python - 合并 MSER 中的区域以识别 OCR 中的文本行

转载作者：太空狗更新时间：2023-10-30 00:42:42

我正在使用 MSER 来识别 MSER 中的文本区域。我正在使用以下代码提取区域并将它们保存为图像。目前，每个识别区域都保存为单独的图像。但是，我想合并属于合并为单个图像的一行文本的区域。

import cv2

img = cv2.imread('newF.png')
mser = cv2.MSER_create()


img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()

regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))

如何将属于同一行的图像拼接在一起？我得到的逻辑将主要基于一些启发式方法来识别具有附近 y 坐标的区域。

但是如何在 OpenCV 中合并这些区域。因为我是 openCV 的新手，所以我错过了这一点。任何帮助将不胜感激。

附加示例图像

期望的输出如下

另一条线

最佳答案

如果您特别注重使用 MSER，那么，正如您所提到的，可以使用将区域与附近的 y 坐标相结合的试探法。以下方法可能效率不高，我会尝试对其进行优化，但它可能会让您了解如何解决问题。

首先，让我们绘制由 MSER 确定的所有 bboxes:

coordinates, bboxes = mser.detectRegions(gray)
for bbox in bboxes:
    x, y, w, h = bbox
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

这给了我们 -

现在，从 bboxes 可以明显看出，高度变化很大，即使在一行中也是如此。因此，为了将边界 bbox 聚类在一行中，我们必须想出一个间隔。我想不出万无一失的办法，所以我选择了给定 bbox 的所有高度中值的一半，这对给定的情况很有效。

bboxes_list = list()
heights = list()
for bbox in bboxes:
    x, y, w, h = bbox
    bboxes_list.append([x, y, x + w, y + h])  # Create list of bounding boxes, with each bbox containing the left-top and right-bottom coordinates
    heights.append(h)
heights = sorted(heights)  # Sort heights
median_height = heights[len(heights) / 2] / 2  # Find half of the median height

现在，要对边界框进行分组，给定 y 坐标的特定间隔(此处为中值高度)，我正在修改我曾经在 stackoverflow 上找到的片段(我将添加一次源我找到它了 )。此函数接受一个列表以及一个特定间隔作为输入，并返回一个组列表，其中每个组包含其 y 坐标的绝对差小于或等于间隔的边界框。请注意，可迭代/列表需要根据 y 坐标进行排序。
```
def grouper(iterable, interval=2):
    prev = None
    group = []
    for item in iterable:
        if not prev or abs(item[1] - prev[1]) <= interval:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group
```

因此，在对边界框进行分组之前，需要根据 y 坐标对它们进行排序。分组后，我们遍历每个组，并确定绘制一个覆盖给定组中所有边界框的边界框所需的最小 x 坐标、最小 y 坐标、最大 x 坐标和最大 y 坐标。

bboxes_list = sorted(bbox_mod, key=lambda k: k[1])  # Sort the bounding boxes based on y1 coordinate ( y of the left-top coordinate )
combined_bboxes = grouper(bboxes_list, median_height)  # Group the bounding boxes
for group in combined_bboxes:
    x_min = min(group, key=lambda k: k[0])[0]  # Find min of x1
    x_max = max(group, key=lambda k: k[2])[2]  # Find max of x2
    y_min = min(group, key=lambda k: k[1])[1]  # Find min of y1
    y_max = max(group, key=lambda k: k[3])[3]  # Find max of y2
    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

最终结果图像 -