gpt4 book ai didi

ios - 连续语音识别。使用 SFSpeechRecognizer (ios10-beta)

转载 作者:IT王子 更新时间:2023-10-29 05:20:19 25 4
gpt4 key购买 nike

我正在尝试执行续。在 iOS 10 测试版上使用 AVCapture 进行语音识别。我设置了 captureOutput(...) 以持续获取 CMSampleBuffers。我将这些缓冲区直接放入我之前设置的 SFSpeechAudioBufferRecognitionRequest 中,如下所示:

... do some setup
SFSpeechRecognizer.requestAuthorization { authStatus in
if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized {
self.m_recognizer = SFSpeechRecognizer()
self.m_recognRequest = SFSpeechAudioBufferRecognitionRequest()
self.m_recognRequest?.shouldReportPartialResults = false
self.m_isRecording = true
} else {
print("not authorized")
}
}
.... do further setup


func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

if(!m_AV_initialized) {
print("captureOutput(...): not initialized !")
return
}
if(!m_isRecording) {
return
}

let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer)
let mediaType = CMFormatDescriptionGetMediaType(formatDesc!)
if (mediaType == kCMMediaType_Audio) {
// process audio here
m_recognRequest?.appendAudioSampleBuffer(sampleBuffer)
}
return
}

整个过程只用了几秒钟。然后不再调用 captureOutput。如果我注释掉 appendAudioSampleBuffer(sampleBuffer) 行,那么只要应用程序运行(如预期),就会调用 captureOutput。显然,将样本缓冲区放入语音识别引擎会以某种方式阻止进一步执行。我猜想可用的缓冲区会在一段时间后被消耗掉,并且进程会以某种方式停止,因为它无法再获得缓冲区???

我应该提一下,前 2 秒内记录的所有内容都会导致正确识别。我只是不知道 SFSpeech API 究竟是如何工作的,因为 Apple 没有将任何文本放入 beta 文档中。顺便说一句:如何使用 SFSpeechAudioBufferRecognitionRequest.endAudio() ?

这里有人知道吗?

谢谢克里斯

最佳答案

我将语音识别 WWDC 开发人员谈话中的 SpeakToMe 示例 Swift 代码转换为 Objective-C,它对我有用。对于 Swift,请参阅 https://developer.apple.com/videos/play/wwdc2016/509/ ,或者对于 Objective-C,请参见下文。

- (void) viewDidAppear:(BOOL)animated {

_recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
[_recognizer setDelegate:self];
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
switch (authStatus) {
case SFSpeechRecognizerAuthorizationStatusAuthorized:
//User gave access to speech recognition
NSLog(@"Authorized");
break;

case SFSpeechRecognizerAuthorizationStatusDenied:
//User denied access to speech recognition
NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
break;

case SFSpeechRecognizerAuthorizationStatusRestricted:
//Speech recognition restricted on this device
NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
break;

case SFSpeechRecognizerAuthorizationStatusNotDetermined:
//Speech recognition not yet authorized

break;

default:
NSLog(@"Default");
break;
}
}];

audioEngine = [[AVAudioEngine alloc] init];
_speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
[_speechSynthesizer setDelegate:self];
}


-(void)startRecording
{
[self clearLogs:nil];

NSError * outError;

AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
[audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];

request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

inputNode = [audioEngine inputNode];

if (request2 == nil) {
NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
}

if (inputNode == nil) {

NSLog(@"Unable to created a inputNode object");
}

request2.shouldReportPartialResults = true;

_currentTask = [_recognizer recognitionTaskWithRequest:request2
delegate:self];

[inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
NSLog(@"Block tap!");

[request2 appendAudioPCMBuffer:buffer];

}];

[audioEngine prepare];
[audioEngine startAndReturnError:&outError];
NSLog(@"Error %@", outError);
}

- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {

NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

[self log:translatedString];

if ([result isFinal]) {
[audioEngine stop];
[inputNode removeTapOnBus:0];
_currentTask = nil;
request2 = nil;
}
}

关于ios - 连续语音识别。使用 SFSpeechRecognizer (ios10-beta),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37821826/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com