python - 如何用汉字打印 tesseract 结果-6ren

python - 如何用汉字打印 tesseract 结果

转载作者：行者123 更新时间：2023-12-04 14:18:26

我正在尝试让我的程序使用 Tesseract 识别中文，并且它有效。我遇到的唯一问题是将结果打印为汉字，结果是用拼音打印(你如何将中文单词输入为英文)。

# Import libraries
from PIL import Image
import pytesseract
from unidecode import unidecode

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image_counter = 2

filelimit = image_counter - 1

outfile = "out_text.txt"

f = open(outfile, "a")

for i in range(1, filelimit + 1):
    print("ran")
    filename = "page_" + str(i) + ".png"

    # Recognize the text as string in image using pytesserct
    text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

    print(text)

这是我运行的图像

这是我得到的

跑了清明世解与分分,陆商行人与断缺新文旧家何出友，木易通之强化村。

结果应该是如图所示的汉字。

最佳答案

没关系，我意识到我的问题了。

text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

应该是

text = pytesseract.image_to_string(Image.open(filename), lang = "chi_tra")

关于python - 如何用汉字打印 tesseract 结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57866592/

文章推荐： r - 是否可以在 AWS SageMaker 实例上使用 RStudio IDE

文章推荐： pyspark - 从 hdfs 目录迭代 pyspark 中的文件

文章推荐： json - Spring Boot Controller 建议 - 如何返回 XML 而不是 JSON？

文章推荐： python - 直接调用 fixture "setUp"。 Fixtures 不应该被直接调用

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何用汉字打印 tesseract 结果