gpt4 book ai didi

c# - 超正方体 OCR : very inaccurate result

转载 作者:行者123 更新时间:2023-11-30 12:41:21 25 4
gpt4 key购买 nike

下面是我用来测试 Tesseract 性能的非常简单的程序。尽管图片是高质量且非常清晰的屏幕截图(不是带有颜色的复杂图片),但我得到的结果并不像预期的那样。请查看我的代码和下面的结果。我不确定是我做错了什么还是 Tesseract 引擎无法处理这个?

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Drawing.Imaging;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using tessnet2;

namespace ImageProcessTesting
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void button1_Click(object sender, EventArgs e)
{
int up_lef_x = 1075;
int up_lef_y = 0070;

int bo_rig_x = 1430;
int bo_rig_y = 0095;

int width = bo_rig_x - up_lef_x;
int height = bo_rig_y - up_lef_y;

var bmpScreenshot = new Bitmap(width, height, PixelFormat.Format32bppArgb);
var gfxScreenshot = Graphics.FromImage(bmpScreenshot);

gfxScreenshot.CopyFromScreen(
1075,
0070,
0,
0,
Screen.PrimaryScreen.Bounds.Size,
CopyPixelOperation.SourceCopy);

// bmpScreenshot.Save("C:\\Users\\Exa\\Screenshot.png", ImageFormat.Png);


var image = bmpScreenshot;
var ocr = new Tesseract();
ocr.Init(@"C:\Users\Exa\Desktop\tessdata", "eng", false);
var result = ocr.DoOCR(image, Rectangle.Empty);
string result_str = "";
foreach (Word word in result)
result_str += word.Text;
MessageBox.Show(result_str);

}
}
}

最佳答案

96DPI 的屏幕截图通常不足以进行 OCR。正如写在Tesseract wiki :

There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".

但是,如果您知道确切的字体是什么,您可以尝试重新训练 tesseract 以获得更好的结果。

关于c# - 超正方体 OCR : very inaccurate result,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37782246/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com