gpt4 book ai didi

google-cloud-platform - Google Cloud Speech API的异步问题

转载 作者:行者123 更新时间:2023-12-03 01:36:39 24 4
gpt4 key购买 nike

我试图从Fleck Websocket音频流中获得最终的语音转录/识别结果。第一次建立Websocket连接时,OnOpen方法执行代码,并且每当从客户端接收到二进制数据时,OnBinary方法就会执行代码。我已经通过将语音回显到websocket并将相同的二进制数据以相同的速率写回到websocket中来测试了websocket。该测试有效,因此我知道二进制数据已正确发送(640字节消息,帧大小为20ms)。

因此,我的代码失败了,而不是服务失败。我的目标是执行以下操作:

  • 创建websocket连接后,使用SingleUtterance == true
  • 将初始音频配置请求发送到API
  • 运行后台任务,以监听等待isFinal == true的流结果
  • 将接收到的每个二进制消息发送到API以进行转录
  • 当后台任务识别到isFinal == true时,停止当前流请求并创建一个新请求-重复步骤1至4

  • 该项目的内容是在实时电话通话中记录所有单个话语。
    socket.OnOpen = () =>
    {
    firstMessage = true;
    };
    socket.OnBinary = async binary =>
    {
    var speech = SpeechClient.Create();
    var streamingCall = speech.StreamingRecognize();
    if (firstMessage == true)
    {
    await streamingCall.WriteAsync(
    new StreamingRecognizeRequest()
    {
    StreamingConfig = new StreamingRecognitionConfig()
    {
    Config = new RecognitionConfig()
    {
    Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
    SampleRateHertz = 16000,
    LanguageCode = "en",
    },
    SingleUtterance = true,
    }
    });
    Task getUtterance = Task.Run(async () =>
    {
    while (await streamingCall.ResponseStream.MoveNext(
    default(CancellationToken)))
    {
    foreach (var result in streamingCall.ResponseStream.Current.Results)
    {
    if (result.IsFinal == true)
    {
    Console.WriteLine("This test finally worked");
    }
    }
    }
    });
    firstMessage = false;
    }
    else if (firstMessage == false)
    {
    streamingCall.WriteAsync(new StreamingRecognizeRequest()
    {
    AudioContent = Google.Protobuf.ByteString.CopyFrom(binary, 0, 640)
    }).Wait();
    }
    };

    最佳答案

    市长的问题是分开一段流来发送语音请求。我找到了可以帮助您实现Websockets和语音集成的代码Google-Cloud-Speech-Node-Socket-Playground,并研究了管理Google Speech请求的功能:

    function startRecognitionStream(client, data) {
    recognizeStream = speechClient.streamingRecognize(request)
    .on('error', console.error)
    .on('data', (data) => {
    process.stdout.write(
    (data.results[0] && data.results[0].alternatives[0])
    ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
    : `\n\nReached transcription time limit, press Ctrl+C\n`);
    client.emit('speechData', data);

    // if end of utterance, let's restart stream
    // this is a small hack. After 65 seconds of silence, the stream will still throw an error for speech length limit
    if (data.results[0] && data.results[0].isFinal) {
    stopRecognitionStream();
    startRecognitionStream(client);
    // console.log('restarted stream serverside');
    }
    });
    }

    请记住,不良的音频质量会带来不良的结果。尝试遵循有关音频的 Best Practices

    我应该认识到开发人员(Vinzenz Aubry),因为他/她的程序效果很好!

    关于google-cloud-platform - Google Cloud Speech API的异步问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52242334/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com