gpt4 book ai didi

tesseract - 为新字体训练 Tesseract

转载 作者:行者123 更新时间:2023-12-03 06:57:19 27 4
gpt4 key购买 nike

使用创建聚类数据时

mftraining -F font_properties -U unicharset -O lan.unicharset *.tr

我收到以下消息

C:\Users\ \AppData\Local\Tesseract-OCR>mftraining -F font_properties -U unicharset -O eng1.unicharset eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr>

Warning: No shape table file present: shapetable
Failed to load unicharset from file unicharset
Building unicharset for training from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Reading eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr> ...

Flat shape table summary: Number of shapes = 0 max unichars = 0 number with multiple unichars = 0

Done!

它重建了我已经完成的 unicharset,并给了我一个 1kb 的 unicharset仅包含此数据的数据值(value)

1
NULL 0 NULL 0

此时我不知道该怎么办。我是这个程序的第一次用户,但对我来说这似乎不对?

最佳答案

看来您需要对训练页面的字符特征进行聚类,如here所述。 .

我相信基本命令是这样的:

shapeclustering -F font_properties -U unicharset lang.fontname.exp0.tr lang.fontname.exp1.tr ...

这似乎是 3.02 版本中添加的内容。

关于tesseract - 为新字体训练 Tesseract,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27048375/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com