speech-recognition - sphinx4 中的语音识别响应很差-6ren

speech-recognition - sphinx4 中的语音识别响应很差

转载作者：行者123 更新时间：2023-12-03 23:35:10

目前我们正在研究使用 sphinx4 进行语音识别。我们正在努力为听写类型的应用程序获得良好的响应。输入是一个 wav 文件，我们希望转录它。我查看了 Sphinx4 提供的 LatticeDemo 和 Transcriber 演示。当我使用相同的配置时，响应很差。我试图调整配置文件，但它根本无法识别这些词。提供的转录器演示是针对数字的，我修改了配置文件以理解单词。但我不确定我是否遗漏了什么。我附上了配置文件。请提出可以进行的任何改进。

<config>        
    <!-- ******************************************************** -->
    <!-- frequently tuned properties                              -->
    <!-- ******************************************************** --> 
    <property name="absoluteBeamWidth"  value="500"/>
    <property name="relativeBeamWidth"  value="1E-60"/>
    <property name="absoluteWordBeamWidth" value="20"/>
    <property name="relativeWordBeamWidth" value="1E-40"/>
    <property name="wordInsertionProbability" value="1E-16"/>
    <property name="languageWeight" value="7.0"/>
    <property name="silenceInsertionProbability" value=".1"/>
    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>


    <!-- ******************************************************** -->
    <!-- word recognizer configuration                            -->
    <!-- ******************************************************** --> 

    <component name="recognizer" 
                          type="edu.cmu.sphinx.recognizer.Recognizer">
        <property name="decoder" value="decoder"/>
        <propertylist name="monitors">
            <item>accuracyTracker </item>
            <item>speedTracker </item>
            <item>memoryTracker </item>
            <item>recognizerMonitor </item>
        </propertylist>
    </component>

    <!-- ******************************************************** -->
    <!-- The Decoder   configuration                              -->
    <!-- ******************************************************** --> 

    <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="wordPruningSearchManager"/>
        <property name="featureBlockSize" value="50"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The Search Manager                                       -->
    <!-- ******************************************************** --> 

    <component name="wordPruningSearchManager" 
    type="edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager">
        <property name="logMath" value="logMath"/>
        <property name="linguist" value="lexTreeLinguist"/>
        <property name="pruner" value="trivialPruner"/>
        <property name="scorer" value="threadedScorer"/>
        <property name="activeListManager" value="activeListManager"/>
        <property name="growSkipInterval" value="0"/>
        <property name="checkStateOrder" value="false"/>
        <property name="buildWordLattice" value="true"/>
        <property name="acousticLookaheadFrames" value="1.7"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The Active Lists                                         -->
    <!-- ******************************************************** --> 

    <component name="activeListManager" 
             type="edu.cmu.sphinx.decoder.search.SimpleActiveListManager">
        <propertylist name="activeListFactories">
        <item>standardActiveListFactory</item>
        <item>wordActiveListFactory</item>
        <item>wordActiveListFactory</item>
        <item>standardActiveListFactory</item>
        <item>standardActiveListFactory</item>
        <item>standardActiveListFactory</item>
    </propertylist>
    </component>

    <component name="standardActiveListFactory" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>

    <component name="wordActiveListFactory" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteWordBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeWordBeamWidth}"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The Pruner                                               -->
    <!-- ******************************************************** --> 
    <component name="trivialPruner" 
                type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

    <!-- ******************************************************** -->
    <!-- TheScorer                                                -->
    <!-- ******************************************************** --> 
    <component name="threadedScorer" 
                type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
        <property name="frontend" value="${frontend}"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The linguist  configuration                              -->
    <!-- ******************************************************** -->

    <component name="lexTreeLinguist" 
                type="edu.cmu.sphinx.linguist.lextree.LexTreeLinguist">
        <property name="logMath" value="logMath"/>
        <property name="acousticModel" value="wsj"/>
        <property name="languageModel" value="trigramModel"/>
        <property name="dictionary" value="dictionary"/>
        <property name="addFillerWords" value="false"/>
        <property name="fillerInsertionProbability" value="1E-10"/>
        <property name="generateUnitStates" value="false"/>
        <property name="wantUnigramSmear" value="true"/>
        <property name="unigramSmearWeight" value="1"/>
        <property name="wordInsertionProbability" 
                value="${wordInsertionProbability}"/>
        <property name="silenceInsertionProbability" 
                value="${silenceInsertionProbability}"/>
        <property name="languageWeight" value="${languageWeight}"/>
        <property name="unitManager" value="unitManager"/>
    </component>    


    <!-- ******************************************************** -->
    <!-- The Dictionary configuration                            -->
    <!-- ******************************************************** -->
    <component name="dictionary" 
        type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
        <property name="dictionaryPath"
                  value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d"/>
        <property name="fillerPath" 
              value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/noisedict"/>
        <property name="addSilEndingPronunciation" value="false"/>
        <property name="wordReplacement" value="&lt;sil&gt;"/>
        <property name="unitManager" value="unitManager"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The Language Model configuration                         -->
    <!-- ******************************************************** -->
    <component name="trigramModel" 
          type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel">
        <property name="unigramWeight" value=".5"/>
        <property name="maxDepth" value="3"/>
        <property name="logMath" value="logMath"/>
        <property name="dictionary" value="dictionary"/>
        <property name="location"
         value="./models/language/wsj/wsj5kc.Z.DMP"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The acoustic model configuration                         -->
    <!-- ******************************************************** -->
    <component name="wsj"
               type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="wsjLoader"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
        <property name="location" value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The unit manager configuration                           -->
    <!-- ******************************************************** -->

    <component name="unitManager" 
        type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>


    <!-- ******************************************************** -->
    <!-- The frontend configuration                               -->
    <!-- ******************************************************** -->

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        </propertylist>
    </component>

    <component name="audioFileDataSource" type="edu.cmu.sphinx.frontend.util.AudioFileDataSource"/>


    <component name="microphone" 
                type="edu.cmu.sphinx.frontend.util.Microphone">
        <property name="closeBetweenUtterances" value="false"/>
    </component>

    <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>

    <component name="speechClassifier"
                type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier">
        <property name="threshold" value="13"/>
    </component>

    <component name="nonSpeechDataFilter" 
                type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

    <component name="speechMarker" 
                type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker">
        <property name="speechTrailer" value="50"/>
    </component>

    <component name="preemphasizer"
        type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

    <component name="windower" 
    type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower"/>

    <component name="fft" 
        type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform"/>

    <component name="melFilterBank" 
        type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank"/>

    <component name="dct" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>

    <component name="liveCMN" 
                type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

    <component name="featureExtraction" 
        type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>

    <!-- Newly Added..   -->
    <component name="streamDataSource"
        type="edu.cmu.sphinx.frontend.util.StreamDataSource">
        <property name="sampleRate" value="16000"/>
        <property name="bigEndianData" value="false"/>
    </component>

    <!-- ******************************************************* -->
    <!--  monitors                                               -->
    <!-- ******************************************************* -->

    <component name="accuracyTracker" 
                type="edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="showRawResults" value="false"/>
        <property name="showAlignedResults" value="false"/>
    </component>

    <component name="memoryTracker" 
                type="edu.cmu.sphinx.instrumentation.MemoryTracker">
        <property name="recognizer" value="${recognizer}"/>
    <property name="showDetails" value="false"/>
    <property name="showSummary" value="false"/>
    </component>

    <component name="speedTracker" 
                type="edu.cmu.sphinx.instrumentation.SpeedTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="frontend" value="${frontend}"/>
    <property name="showDetails" value="false"/>
    </component>

    <component name="recognizerMonitor" 
                type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
        <property name="recognizer" value="${recognizer}"/>
        <propertylist name="allocatedMonitors">
            <item>configMonitor </item>
        </propertylist>
    </component>

    <component name="configMonitor" 
                type="edu.cmu.sphinx.instrumentation.ConfigMonitor">
        <property name="showConfig" value="false"/>
    </component>


    <!-- ******************************************************* -->
    <!--  Miscellaneous components                               -->
    <!-- ******************************************************* -->

    <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
        <property name="logBase" value="1.0001"/>
        <property name="useAddTable" value="true"/>
    </component>
</config>

最佳答案

识别准确度不佳的最常见原因是:

传入音频的采样率不匹配。它必须是 16khz 16bit mono little-endian 文件。您需要通过重采样来固定源的采样率。
从 mp3 解码的音频文件中的零静音区域会破坏解码器。可以使用抖动引入小的随机噪声来解决这个问题。
声学模型不匹配。您可以使用声学模型自适应来提高准确性
语言模型不匹配。您可以创建自己的语言模型来匹配您尝试解码的词汇。

要获得更详细的帮助，您可以随时提供您尝试解码的音频样本。他们将帮助开发人员更好地分析问题。提供您从解码器获得的实际结果和您的期望也很有帮助。

关于speech-recognition - sphinx4 中的语音识别响应很差，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7411563/

文章推荐： ruby-on-rails - 使用 `update_all` 时运行验证

文章推荐： flash - 复杂的蒙版绘制损坏的形状

sphinx - 通配符搜索问题 sphinx
在 Sphinx 2.0.6 中尝试启用通配符 (*) 的搜索时出现以下错误 index products: syntax error, unexpected $undefined near '*'
sphinx - 在 sphinx.conf 中更新索引时总是需要在 sphinx 中重新启动 searchd？
如果我更新 sphinx.conf 文件中的资源，我可以使用 --rotate 重新索引，一切正常。如果我更新 sphinx.conf 中的索引或添加新索引 --rotate 无效，我必须重新启动 s
python-sphinx - Sphinx 扩展以在 Sphinx 中使用 GitHub Markdown 表情符号？
问题我一直在用(Python) Sphinx doc ，以及 CommonMark解析器，编写包含用 reStructuredText 和 Markdown 编写的文件的 Sphinx 文档。到目前
python-sphinx - 基于 sphinx-bootstrap-theme 在 Python-Sphinx 文档上启用侧边栏
我正在使用漂亮的 sphinx-bootstrap-theme 0.3.4 并尝试将它应用到 Sphinx Python 文档生成器 1.2 版之上。当我通过 make html 构建文档时，我没有
python - OS X 安装 Sphinx : The 'sphinx-build' and 'sphinx-quickstart' not found
关于此主题，有几篇“未答复”的帖子与无法找到“sphinx-build”有关: sphinx-build -h command not found in Mac OS Sphinx 是在 OSX 上使
sphinx - 查询 Sphinx 搜索索引
我正在使用 Sphinx 搜索引擎，我遇到一个问题，即一些文件没有显示在搜索结果中，但绝对应该显示。我已经检查以确保没有信息。缺少会阻止这些文件出现的信息。有什么方法可以直接查询索引，看看有没有这些
sphinx - 你如何从 Sphinx 获取所有记录？
如何使用 Sphinx 从索引中获取所有记录？就像 SELECT * FROM index 一样？我知道我可以做这样的事情来获取与特定关键字匹配的所有记录:/usr/local/sphinx/bin/
python-sphinx - sphinx 用于多个单独的文档
我对 Sphinx 很陌生，在服务器上记录我的项目。现在一位同事看到了我一直在做的事情，她想做同样的事情——在同一台服务器上记录她的项目。这些项目不相关(它们不属于单个 TOCtree)，我不知道如
sphinx - 思考 Sphinx 索引性能
我有一个很大的索引定义，索引需要很长时间。我怀疑主要问题是由生成的许多 LEFT OUTER JOIN 引起的。我看到了 this question ，但找不到有关使用 source: :query
python-sphinx - Sphinx 无法导入模块依赖项
写作的python工具，awscli-bastion , 具有以下由 cookiecutter 构建的目录结构. . ├── awscli_bastion │ ├── __init__.py │
python-sphinx - Sphinx 文档生成器 : "only" not?
Sphinx 文档生成器提供 only markup .例如，以下将仅包含外部文件“仅”如果其 html 生成器: .. only:: html .. include:: a.rst 但是我将如
sphinx - 思维 sphinx 模糊搜索？
我在我的Rails应用程序中实现了 sphinx 搜索。我想模糊搜索。它应该搜索拼写错误，例如，如果输入搜索查询charact * a * ristics，则应该搜索charact * e * ri
python-sphinx - Sphinx 文档中的人类可读可迭代对象
Sphinx-autodoc 将字典、列表和元组扁平化 - 使长的几乎不可读。也并不总是需要 pretty-print 格式，因为一些嵌套的容器比分列更好地保持扁平化。有没有办法显示源代码中输入的可迭
python-sphinx - Sphinx 无序列表中的额外空行
我正在使用 Sphinx 为我的项目编写文档，并且发现下面给出的两个相似的 reStructuredText 段的呈现方式有所不同。示例 1 Some text: * Item 0 * Item
python-sphinx - Sphinx:在笔记部分之后恢复列表编号
考虑ReStructuredText中的以下列表： Broken list example ------------------- #. First do spam #. Then do ``eggs
python-sphinx - Sphinx 文档变量
我正在使用 Sphinx Doc 为我的一个项目创建文档，并且我在整个文档中多次使用了一些词，例如 - IP 地址、端口号和许多其他可能会随时间变化的内容。如果由于某种原因，其中一个将被更改，我只想在
python-sphinx - Sphinx 中普通文本的换行符
我在 .rst 文件中有以下文本: Some text. * Heading | The first topic. | Another topic which is very verbose
python-sphinx - Sphinx 中的常见链接目标
我有很多 Sphinx 页面，它们都有相同的链接。像那些: .. _CC-BY: https://creativecommons.org/licenses/by/3.0/ .. _MIT: http:
python-sphinx - Sphinx:生成外部链接
我想链接到我的狮身人面像文档中的一些URL: blah 我在文档中发现了类似的内容:http://sphinx-doc.org/ext/extlinks.html-而是按照约定用链接替换自定义语法。
python-sphinx - Sphinx，使用自动模块查找子模块
使用 sphinx 的自动模块 (https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html) 时，我只是写在一个 .rst

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

speech-recognition - sphinx4 中的语音识别响应很差