python - 如何仅在 pytesser 中启用数字？-6ren

python - 如何仅在 pytesser 中启用数字？

转载作者：太空宇宙更新时间：2023-11-04 06:24:17

我正在运行 pytesser 以在 python 中对图像进行 OCR。我第一次从页面上抓取图像时，它很好，但在接下来的几页中准确性变差，直到 87+1 为 $+$

奇怪，嗯？我的猜测是因为 pytesser(来自 tesseract for python 的端口)是为了识别单词而构建的，并将你的 OCR 放入下一个问题的上下文中。所以，没有办法禁用它，我只能将它设置为数字吗？但是 pytesser 没有太多关于它的文档，所以我继续查看 tesseract 常见问题解答，但我并没有真正得到代码。

Use
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
BEFORE calling an Init function or put this in a text file called tessdata/configs/digits:
tessedit_char_whitelist 0123456789
and then your command line becomes:
tesseract image.tif outputbase nobatch digits
Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.

我猜它适用于 C 或 C++。有没有办法在 python 中做到这一点？或者更好的是，禁用 OCR 上下文？

最佳答案

在 python 中:

import tesseract
ocr = tesseract.TessBaseAPI();
ocr.Init(".","eng",tesseract.OEM_TESSERACT_ONLY)
ocr.SetVariable("tessedit_char_whitelist", "0123456789")

你可能还想:

ocr.SetVariable("classify_enable_learning", "0")
ocr.SetVariable("classify_enable_adaptive_matcher", "0")

关于python - 如何仅在 pytesser 中启用数字？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9466694/

文章推荐：用于列表比较的 Python IN 运算符

文章推荐： c - 在 C 中使用线程时的奇怪行为

文章推荐：在 C 程序的宏预处理器中调用内联函数

文章推荐： Python - 在 Ubuntu 10.04 LTS 上安装最新版本的 matplotlib

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何仅在 pytesser 中启用数字？