gpt4 book ai didi

.net - 微软语音识别 : Alternate results with confidence score?

转载 作者:行者123 更新时间:2023-12-01 05:17:23 25 4
gpt4 key购买 nike

我是使用 Microsoft.Speech 识别器(使用 Microsoft Speech Platform SDK 版本 11)的新手,我试图让它从一个简单的语法输出 n 最佳识别匹配,以及每个的置信度分数。

根据文档(如提到的 in the answer to this question ),应该可以使用 e.Result.Alternates访问除得分最高的单词以外的已识别单词。然而,即使将置信度拒绝阈值重置为 0(这应该意味着什么都不会被拒绝),我仍然只得到一个结果,并且没有替代结果(尽管 SpeechHypothesized 事件表明至少其他单词中的一个似乎是在某些时候以非零置信度识别)。

我的问题:任何人都可以向我解释为什么我只得到一个识别词,即使置信度拒绝阈值设置为零?如何获得其他可能的匹配项及其置信度分数?我在这里想念什么?

下面是我的代码。提前感谢任何可以提供帮助的人:)

在下面的示例中,识别器被发送一个单词“news”的 wav 文件,并且必须从相似的单词(“noose”、“newts”)中进行选择。我想提取每个单词的识别器置信度得分列表(它们都应该不为零),即使它只会返回最好的一个(“新闻”)作为结果。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Speech.Recognition;

namespace SimpleRecognizer
{
class Program
{
static readonly string[] settings = new string[] {
"CFGConfidenceRejectionThreshold",
"HighConfidenceThreshold",
"NormalConfidenceThreshold",
"LowConfidenceThreshold"};

static void Main(string[] args)
{
// Create a new SpeechRecognitionEngine instance.
SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE

// Configure the input to the recognizer.
sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav");

// Display Recognizer Settings (Confidence Thresholds)
ListSettings(sre);

// Set Confidence Threshold to Zero (nothing should be rejected)
sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0);
sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0);
sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0);
sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0);

// Display New Recognizer Settings
ListSettings(sre);

// Build a simple Grammar with three choices
Choices topics = new Choices();
topics.Add(new string[] { "news", "newts", "noose" });
GrammarBuilder gb = new GrammarBuilder();
gb.Append(topics);
Grammar g = new Grammar(gb);
g.Name = "g";

// Load the Grammar
sre.LoadGrammar(g);

// Register handlers for Grammar's SpeechRecognized Events
g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized);

// Register a handler for the recognizer's SpeechRecognized event.
sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

// Register Handler for SpeechHypothesized
sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);

// Start recognition.
sre.Recognize();

Console.ReadKey(); //wait to close

}
static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name);
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
}
}
static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence);
Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString());
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
}
}
static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence);
}
private static void ListSettings(SpeechRecognitionEngine recognizer)
{
foreach (string setting in settings)
{
try
{
object value = recognizer.QueryRecognizerSetting(setting);
Console.WriteLine(" {0,-30} = {1}", setting, value);
}
catch
{
Console.WriteLine(" {0,-30} is not supported by this recognizer.",
setting);
}
}
Console.WriteLine();
}
}
}

这给出了以下输出:
Original recognizer settings:
CFGConfidenceRejectionThreshold = 20
HighConfidenceThreshold = 80
NormalConfidenceThreshold = 50
LowConfidenceThreshold = 20

Updated recognizer settings:
CFGConfidenceRejectionThreshold = 0
HighConfidenceThreshold = 0
NormalConfidenceThreshold = 0
LowConfidenceThreshold = 0

Speech from grammar g hypothesized: noose, 0.2214646
Speech from grammar g hypothesized: news, 0.640804

Number of Alternates from Grammar g: 1
news, 0.9208503

Speech recognized: news, 0.9208503
Number of Alternates from Recognizer: 1
news, 0.9208503

我还尝试为每个单词使用一个单独的短语(而不是一个具有三个选项的短语),甚至为每个单词/短语使用单独的语法来实现这一点。结果基本相同:只有一个“替代品”。

最佳答案

我相信这是 SAPI 允许您询问 SR 引擎并不真正支持的东西的另一个地方。

Microsoft.Speech.Recognition 和 System.Speech.Recognition 都使用底层 SAPI 接口(interface)来完成它们的工作;唯一的区别是使用的是哪个 SR 引擎。 (Microsoft.Speech.Recognition 使用服务器引擎;System.Speech.Recognition 使用桌面引擎。)

替代主要是为听写设计的,而不是上下文无关的语法。您始终可以为 CFG 获得一个替代项,但替代生成代码看起来不会扩展 CFG 的替代项。

不幸的是,Microsoft.Speech.Recognition 引擎不支持听写。 (但是,它确实可以处理质量低得多的音频,并且不需要培训。)

关于.net - 微软语音识别 : Alternate results with confidence score?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18965286/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com