gpt4 book ai didi

c# - 如何让Tesseract OCR输出句子形式的单词?

转载 作者:行者123 更新时间:2023-11-30 23:29:51 35 4
gpt4 key购买 nike

Sample Image Input

我得到的结果是这样的: http://i.stack.imgur.com/dM0qG.png

是否可以让 Tesseract 以这样的句子/段落形式输出?

This is to certify that you have successfully PASSED the PHIL-IT General Certification Examination held on January 26, 2015 at the Cebu Institute of Technology - University, N. Bacalso Avenue, Cebu City 6000 Philippines.

最佳答案

因为result是一个Tessnet2.WordList,并且存储了每个Word的文本在其 item.Text 中,您可以:

  1. 创建一个只有单词的列表(不是完整的 Tessnet2.Word 对象)
  2. 加入这个列表,使用“空格”作为分隔符

假设您的结果存储在名为 result 的 var 中(您执行了操作
var result = ocr.DoOCR(image, null);
).如果结合这两个步骤,它看起来像这样:

string phrase = string.Join(" ", result.Select(x => x.Text).ToList());

结果是:

This is to certify that you have successfully PASSED the Phil-lT General Certification Examination held on [nnuag 26, 2015 at the Cebu Institute uf Tedmnlngy · University , N. Bacalso Avenue, Cebu City 6000 Philippines.

(它有一些检测错误,但那是另一个问题)

关于c# - 如何让Tesseract OCR输出句子形式的单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35200621/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com