ios - Swift iOS - 视觉框架文本识别和矩形-6ren

ios - Swift iOS - 视觉框架文本识别和矩形

转载作者：行者123 更新时间：2023-12-03 08:02:57

我试图在使用 Vision 框架找到的文本区域上绘制矩形，但它们总是有点偏离。我这样做是这样的:

    public func drawOccurrencesOnImage(_ occurrences: [CGRect], _ image: UIImage) -> UIImage? {

    UIGraphicsBeginImageContextWithOptions(image.size, false, 0.0)

    image.draw(at: CGPoint.zero)
    let currentContext = UIGraphicsGetCurrentContext()

    currentContext?.addRects(occurrences)
    currentContext?.setStrokeColor(UIColor.red.cgColor)
    currentContext?.setLineWidth(2.0)
    currentContext?.strokePath()

    guard let drawnImage = UIGraphicsGetImageFromCurrentImageContext() else { return UIImage() }

    UIGraphicsEndImageContext()
    return drawnImage
}

但是返回的图像看起来总是差不多，但并不真正正确:

这就是我创建盒子的方式，与 Apple 完全相同:

        let boundingRects: [CGRect] = observations.compactMap { observation in

        guard let candidate = observation.topCandidates(1).first else { return .zero }

        let stringRange = candidate.string.startIndex..<candidate.string.endIndex
        let boxObservation = try? candidate.boundingBox(for: stringRange)

        let boundingBox = boxObservation?.boundingBox ?? .zero

        return VNImageRectForNormalizedRect(boundingBox,
                                            Int(UIViewController.chosenImage?.width ?? 0),
                                            Int(UIViewController.chosenImage?.height ?? 0))
    }

(来源:https://developer.apple.com/documentation/vision/recognizing_text_in_images)

谢谢。

最佳答案

VNImageRectForNormalizedRect 返回 CGRect，其中 y 坐标已翻转。 (我怀疑它是为 macOS 编写的，它使用与 iOS 不同的坐标系。)

相反，我可能会建议改编自 Detecting Objects in Still Images 的 boundingBox 版本:

/// Convert Vision coordinates to pixel coordinates within image.
///
/// Adapted from `boundingBox` method from
/// [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
/// This flips the y-axis.
///
/// - Parameters:
///   - boundingBox: The bounding box returned by Vision framework.
///   - bounds: The bounds within the image (in pixels, not points).
///
/// - Returns: The bounding box in pixel coordinates, flipped vertically so 0,0 is in the upper left corner

func convert(boundingBox: CGRect, to bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height

    // Begin with input rect.
    var rect = boundingBox

    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.minX
    rect.origin.y = (1 - rect.maxY) * imageHeight + bounds.minY

    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight

    return rect
}

注意，我更改了方法名称，因为它不返回边界框，而是将边界框(值在 [0,1] 中)转换为 CGRect。我还修复了其 boundingBox 实现中的一个小错误。但它捕获了主要思想，即翻转边界框的 y 轴。

无论如何，这会产生正确的框:

例如

func recognizeText(in image: UIImage) {
    guard let cgImage = image.cgImage else { return }
    let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: .up)

    let size = CGSize(width: cgImage.width, height: cgImage.height) // note, in pixels from `cgImage`; this assumes you have already rotate, too
    let bounds = CGRect(origin: .zero, size: size)
    // Create a new request to recognize text.
    let request = VNRecognizeTextRequest { [self] request, error in
        guard
            let results = request.results as? [VNRecognizedTextObservation],
            error == nil
        else { return }

        let rects = results.map {
            convert(boundingBox: $0.boundingBox, to: CGRect(origin: .zero, size: size))
        }

        let string = results.compactMap {
            $0.topCandidates(1).first?.string
        }.joined(separator: "\n")

        let format = UIGraphicsImageRendererFormat()
        format.scale = 1
        let final = UIGraphicsImageRenderer(bounds: bounds, format: format).image { _ in
            image.draw(in: bounds)
            UIColor.red.setStroke()
            for rect in rects {
                let path = UIBezierPath(rect: rect)
                path.lineWidth = 5
                path.stroke()
            }
        }

        DispatchQueue.main.async { [self] in
            imageView.image = final
            label.text = string
        }
    }

    DispatchQueue.global(qos: .userInitiated).async {
        do {
            try imageRequestHandler.perform([request])
        } catch {
            print("Failed to perform image request: \(error)")
            return
        }
    }
}

/// Convert Vision coordinates to pixel coordinates within image.
///
/// Adapted from `boundingBox` method from
/// [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
/// This flips the y-axis.
///
/// - Parameters:
///   - boundingBox: The bounding box returned by Vision framework.
///   - bounds: The bounds within the image (in pixels, not points).
///
/// - Returns: The bounding box in pixel coordinates, flipped vertically so 0,0 is in the upper left corner

func convert(boundingBox: CGRect, to bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height

    // Begin with input rect.
    var rect = boundingBox

    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.minX
    rect.origin.y = (1 - rect.maxY) * imageHeight + bounds.minY

    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight

    return rect
}

///  Scale and orient picture for Vision framework
///
///  From [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
///
///  - Parameter image: Any `UIImage` with any orientation
///  - Returns: An image that has been rotated such that it can be safely passed to Vision framework for detection.

func scaleAndOrient(image: UIImage) -> UIImage {

    // Set a default value for limiting image size.
    let maxResolution: CGFloat = 640

    guard let cgImage = image.cgImage else {
        print("UIImage has no CGImage backing it!")
        return image
    }

    // Compute parameters for transform.
    let width = CGFloat(cgImage.width)
    let height = CGFloat(cgImage.height)
    var transform = CGAffineTransform.identity

    var bounds = CGRect(x: 0, y: 0, width: width, height: height)

    if width > maxResolution ||
        height > maxResolution {
        let ratio = width / height
        if width > height {
            bounds.size.width = maxResolution
            bounds.size.height = round(maxResolution / ratio)
        } else {
            bounds.size.width = round(maxResolution * ratio)
            bounds.size.height = maxResolution
        }
    }

    let scaleRatio = bounds.size.width / width
    let orientation = image.imageOrientation
    switch orientation {
    case .up:
        transform = .identity
    case .down:
        transform = CGAffineTransform(translationX: width, y: height).rotated(by: .pi)
    case .left:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: 0, y: width).rotated(by: 3.0 * .pi / 2.0)
    case .right:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: height, y: 0).rotated(by: .pi / 2.0)
    case .upMirrored:
        transform = CGAffineTransform(translationX: width, y: 0).scaledBy(x: -1, y: 1)
    case .downMirrored:
        transform = CGAffineTransform(translationX: 0, y: height).scaledBy(x: 1, y: -1)
    case .leftMirrored:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: height, y: width).scaledBy(x: -1, y: 1).rotated(by: 3.0 * .pi / 2.0)
    case .rightMirrored:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(scaleX: -1, y: 1).rotated(by: .pi / 2.0)
    default:
        transform = .identity
    }

    return UIGraphicsImageRenderer(size: bounds.size).image { rendererContext in
        let context = rendererContext.cgContext

        if orientation == .right || orientation == .left {
            context.scaleBy(x: -scaleRatio, y: scaleRatio)
            context.translateBy(x: -height, y: 0)
        } else {
            context.scaleBy(x: scaleRatio, y: -scaleRatio)
            context.translateBy(x: 0, y: -height)
        }
        context.concatenate(transform)
        context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
    }
}

关于ios - Swift iOS - 视觉框架文本识别和矩形，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73397910/

文章推荐： google-bigquery - bigquery 数据集中所有表的行数

文章推荐： c# - 错误消息的中心页面

python - 如何知道边界框(矩形)是否位于另一个边界框(矩形)内？
我正在尝试将外框内的框(坐标)放入。我已经使用交集联合方法完成了工作，并且我希望其他方法也可以这样做。另外，能否请您告诉我如何比较这两个内盒？最佳答案通过比较边界框和内部框的左上角和右下角的坐标
Java数字模式两个三角形并排制作正方形/矩形
我希望输出看起来像这样: 如何安排这些循环以获得两个三角形数字模式？我该如何改进我的代码。 JAVA 中的新功能:-) for (int i = 1; icount; num--) {
mysql写入和读取地理边界(矩形)
我需要将 map 边界存储在 MySQL 数据库中。我花了一些时间在地理空间扩展的文档上，但是学习所有相关信息(WKT、WKB 等)很困难，而且就我而言没有必要。我只需要一种方法来存储坐标矩形并稍后将
gnuplot - 如何从文件中绘制对象(矩形)
在 gnuplot 中，我可以通过绘制一个矩形 set object rect from x0,y0 to x1,y1 如何从文件中读取坐标 x0,x1,y0,y1？最佳答案一种方法是将设置矩形的
用水平线或垂直线填充的 WPF 矩形
我正在尝试创建一个填充了水平线或垂直线的矩形。矩形的宽度是动态的，所以我不能使用图像刷。如果有人知道任何解决方案，请告诉我。最佳答案我想出了一个直接的方法来做到这一点；最后，我使用以下视觉画笔
在所有浏览器中模糊的 SVG 矩形
这个 SVG 在所有浏览器中看起来都很模糊，在所有缩放级别。在 Chrome、Safari 和 Firefox 中，它看起来像这样: 如果放大，您可以看到笔画有两个像素的宽度，即使默认笔画
r - 向ggplot2图形添加多个阴影/矩形
我正在尝试在ggplot2图上添加多个阴影/矩形。在这个可重现的示例中，我只添加了3，但是使用完整数据可能需要总计一百。这是我的原始数据的子集-在名为temp的数据框中-dput在问题的底部：
wpf3d 矩形 HitTest
我有一个包含驻留在 Viewport3D 中的 3D 对象的应用程序，我希望用户能够通过在屏幕上拖动一个矩形来选择它们。我尝试在 Viewport3D 上应用 GeometryHitTestPara
WPF 矩形 - 圆化顶角
如何才能使 WPF 矩形的顶角变成圆角？我创建了一个边框并设置了 CornerRadius 属性，并在边框内添加了矩形，但它不起作用，矩形不是圆角的。最佳答案您遇到的问题是矩形“溢
javascript - 旋转传单折线/矩形
我正在尝试使用此 question 中的代码旋转 Leaflet 矩形。 rotatePoints (center, points, yaw) { const res = [] const a
opencv - 从图像中删除框/矩形
我有以下图像。 this image 我想删除数字周围的橙色框/矩形，并保持原始图像干净，没有任何橙色网格/矩形。以下是我当前的代码，但没有将其删除。 Mat mask = new Mat(); M
math - 矩形——一道数学题
我发现矩形有些不好笑: 比方说，给定的是左、上、右和下坐标的值，所有这些坐标都旨在包含在内。所以，计算宽度是这样的: width = right - left + 1 到目前为止，一切都很合乎逻辑。
Java JFrame 矩形
所以，我一直在学习 Java，但我还是个新手，所以请耐心等待。我最近的目标是图形化程序，这次是对键盘控制的测试。由于某种原因，该程序不会显示矩形。通常，paint() 会独立运行，但由于某种原因它不会
java编码确定点是否位于多边形(矩形)内部的解决方案
我正在阅读 website 中的解决方案 3 (2D)并试图将其翻译成java代码。 java是否正确请评论。我使用的是纬度和经度坐标，而不是 x 和 y 坐标(注意:loc.getLongitude
iText 矩形 - 无法删除边框
我似乎无法删除矩形上的边框!请参阅下面的代码，我正在使用 PDFannotation 创建链接。这些链接都有效，但每个矩形都有一个边框。 PdfAnnotation annotation; Recta
c# - 在维护区域时旋转位图(矩形)
如何在保持原始位图面积的同时将位图旋转给定的度数。即，我旋转宽度:100，高度:200 的位图，我的最终结果将是一个更大的图像，但旋转部分的面积仍然为 100*200 最佳答案图形转换函数非常适合这
c# - 如何清除picturebox中的图形(矩形)
我创建了矩形用户控件，我在我的应用程序中使用了这个用户控件。在我的应用程序中，我正在处理图像以进行不同的操作，例如从图像中读取条形码等。这里我有两种处理图像的可能性，一种正在处理整个图像，另一个正在处
c# - 画线和时的奇怪行为。矩形
好的，我该如何开始呢？我有一个应用程序可以在屏幕上绘制一些形状(实际上是几千个)。它们有两种类型:矩形和直线。矩形有填充，线条有描边 + 描边厚度。我从两个文件中读取数据，一个是顶部的数据，一个是
JavaFX ActionEvent 矩形
简而言之: 我正在致力于使用 AI 和 GUI 创建纸牌游戏。用户的手显示在游戏界面上，我尚未完成界面，但我打算将牌面图像添加到屏幕上的矩形中。我没有找到 5 种几乎相同的方法，而是找到了一篇类似的文
java - ArrayList 矩形
我遇到了麻烦。我正在尝试使用用户输入的数组列表创建条形图。我可以创建一个条，但只会创建一个条。我需要所有数组输入来创建一个条。 import java.awt.Color; import java.a

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

ios - Swift iOS - 视觉框架文本识别和矩形