java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务-6ren

java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务

转载作者：搜寻专家更新时间：2023-10-31 19:53:20

尝试使用 Java SDK 将来自麦克风的连续音频流直接发送到 IBM Watson SpeechToText Web 服务。随发行版提供的示例之一 ( RecognizeUsingWebSocketsExample ) 显示了如何将 .WAV 格式的文件流式传输到服务。但是，.WAV 文件要求提前指定它们的长度，因此一次只向文件附加一个缓冲区的幼稚方法是不可行的。

看来SpeechToText.recognizeUsingWebSocket可以接受一个流，但是给它一个 AudioInputStream 的实例似乎没有这样做，似乎连接已建立，但即使 RecognizeOptions.interimResults(true) 也没有返回任何成绩单.

public class RecognizeUsingWebSocketsExample {
private static CountDownLatch lock = new CountDownLatch(1);

public static void main(String[] args) throws FileNotFoundException, InterruptedException {
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");

AudioInputStream audio = null;

try {
    final AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
    TargetDataLine line;
    line = (TargetDataLine)AudioSystem.getLine(info);
    line.open(format);
    line.start();
    audio = new AudioInputStream(line);
    } catch (LineUnavailableException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

RecognizeOptions options = new RecognizeOptions.Builder()
    .continuous(true)
    .interimResults(true)
    .contentType(HttpMediaType.AUDIO_WAV)
    .build();

service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
  @Override
  public void onTranscription(SpeechResults speechResults) {
    System.out.println(speechResults);
    if (speechResults.isFinal())
      lock.countDown();
  }
});

lock.await(1, TimeUnit.MINUTES);
}
}

任何帮助将不胜感激。

-rg

这是基于下面德国人评论的更新(谢谢)。

我可以使用 javaFlacEncode将来自麦克风的 WAV 流转换为 FLAC 流并将其保存到临时文件中。与 WAV 音频文件的大小在创建时固定不同，FLAC 文件可以轻松附加到。

    WAV_audioInputStream = new AudioInputStream(line);
    FileInputStream FLAC_audioInputStream = new FileInputStream(tempFile);

    StreamConfiguration streamConfiguration = new StreamConfiguration();
    streamConfiguration.setSampleRate(16000);
    streamConfiguration.setBitsPerSample(8);
    streamConfiguration.setChannelCount(1);

    flacEncoder = new FLACEncoder();
    flacOutputStream = new FLACFileOutputStream(tempFile);  // write to temp disk file

    flacEncoder.setStreamConfiguration(streamConfiguration);
    flacEncoder.setOutputStream(flacOutputStream);

    flacEncoder.openFLACStream();

    ...
    // convert data
    int frameLength = 16000;
    int[] intBuffer = new int[frameLength];
    byte[] byteBuffer = new byte[frameLength];

    while (true) {
        int count = WAV_audioInputStream.read(byteBuffer, 0, frameLength);
        for (int j1=0;j1<count;j1++)
            intBuffer[j1] = byteBuffer[j1];

        flacEncoder.addSamples(intBuffer, count);
        flacEncoder.encodeSamples(count, false);  // 'false' means non-final frame
    }

    flacEncoder.encodeSamples(flacEncoder.samplesAvailableToEncode(), true);  // final frame
    WAV_audioInputStream.close();
    flacOutputStream.close();
    FLAC_audioInputStream.close();

添加任意数量的帧后，可以毫无问题地分析结果文件(使用 curl 或 recognizeUsingWebSocket() )。然而， recognizeUsingWebSocket()即使文件的最后一帧可能不是最终的(即在 encodeSamples(count, false) 之后)，它也会在到达 FLAC 文件的末尾时立即返回最终结果。

我希望 recognizeUsingWebSocket()阻塞直到最后一帧写入文件。实际上，这意味着分析在第一帧之后停止，因为分析第一帧比收集第二帧花费的时间更少，因此在返回结果时，到达文件末尾。

这是在 Java 中从麦克风实现流式音频的正确方法吗？似乎是一个常见的用例。

这是对 RecognizeUsingWebSocketsExample 的修改，在下面结合了 Daniel 的一些建议。它使用 PCM 内容类型(作为 String 与帧大小一起传递)，并尝试发出音频流结束的信号，尽管不是很成功。

和以前一样，连接已建立，但从未调用识别回调。关闭流似乎也没有被解释为音频的结束。我一定在这里误解了一些东西......

    public static void main(String[] args) throws IOException, LineUnavailableException, InterruptedException {

    final PipedOutputStream output = new PipedOutputStream();
    final PipedInputStream  input  = new PipedInputStream(output);

  final AudioFormat format = new AudioFormat(16000, 8, 1, true, false);
  DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
  final TargetDataLine line = (TargetDataLine)AudioSystem.getLine(info);
  line.open(format);
  line.start();

    Thread thread1 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
              final int MAX_FRAMES = 2;
              byte buffer[] = new byte[16000];
              for(int j1=0;j1<MAX_FRAMES;j1++) {  // read two frames from microphone
              int count = line.read(buffer, 0, buffer.length);
              System.out.println("Read audio frame from line: " + count);
              output.write(buffer, 0, buffer.length);
              System.out.println("Written audio frame to pipe: " + count);
              }
              /** no need to fake end-of-audio;  StopMessage will be sent 
              * automatically by SDK once the pipe is drained (see WebSocketManager)
              // signal end of audio; based on WebSocketUploader.stop() source
              byte[] stopData = new byte[0];
              output.write(stopData);
              **/
            } catch (IOException e) {
            }
        }
    });
    thread1.start();

  final CountDownLatch lock = new CountDownLatch(1);

  SpeechToText service = new SpeechToText();
  service.setUsernameAndPassword("<username>", "<password>");

  RecognizeOptions options = new RecognizeOptions.Builder()
  .continuous(true)
  .interimResults(false)
  .contentType("audio/pcm; rate=16000")
  .build();

  service.recognizeUsingWebSocket(input, options, new BaseRecognizeCallback() {
    @Override
    public void onConnected() {
      System.out.println("Connected.");
    }
    @Override
    public void onTranscription(SpeechResults speechResults) {
    System.out.println("Received results.");
      System.out.println(speechResults);
      if (speechResults.isFinal())
        lock.countDown();
    }
  });

  System.out.println("Waiting for STT callback ... ");

  lock.await(5, TimeUnit.SECONDS);

  line.stop();

  System.out.println("Done waiting for STT callback.");

}

Dani，我检测了 WebSocketManager 的来源(带有 SDK)并替换了对 sendMessage() 的调用带有明确的 StopMessage有效载荷如下:

        /**
     * Send input steam.
     *
     * @param inputStream the input stream
     * @throws IOException Signals that an I/O exception has occurred.
     */
    private void sendInputSteam(InputStream inputStream) throws IOException {
      int cumulative = 0;
      byte[] buffer = new byte[FOUR_KB];
      int read;
      while ((read = inputStream.read(buffer)) > 0) {
        cumulative += read;
        if (read == FOUR_KB) {
          socket.sendMessage(RequestBody.create(WebSocket.BINARY, buffer));
        } else {
          System.out.println("completed sending " + cumulative/16000 + " frames over socket");
          socket.sendMessage(RequestBody.create(WebSocket.BINARY, Arrays.copyOfRange(buffer, 0, read)));  // partial buffer write
          System.out.println("signaling end of audio");
          socket.sendMessage(RequestBody.create(WebSocket.TEXT, buildStopMessage().toString()));  // end of audio signal

        }

      }
      inputStream.close();
    }

sendMessage() 选项(发送 0 长度的二进制内容或发送停止文本消息)似乎都不起作用。调用者代码与上面相同。结果输出是:

Waiting for STT callback ... 
Connected.
Read audio frame from line: 16000
Written audio frame to pipe: 16000
Read audio frame from line: 16000
Written audio frame to pipe: 16000
completed sending 2 frames over socket
onFailure: java.net.SocketException: Software caused connection abort: socket write error

修订:实际上，永远不会到达音频结束通话。将最后一个(部分)缓冲区写入套接字时抛出异常。

为什么连接中止？这通常发生在对等方关闭连接时。

至于第 2) 点:在现阶段，这两者中的任何一个都重要吗？似乎识别过程根本没有启动......音频是有效的(我将流写入磁盘，并能够通过从文件中传输流来识别它，正如我上面指出的那样)。

此外，进一步审查 WebSocketManager源代码， onMessage()已发送 StopMessage立即 return来自 sendInputSteam() (即，当音频流或上面示例中的管道耗尽时)，因此无需显式调用它。问题肯定发生在音频数据传输完成之前。行为是相同的，无论 PipedInputStream或 AudioInputStream作为输入传递。在这两种情况下发送二进制数据时都会抛出异常。

最佳答案

Java SDK 有一个示例并支持这一点。

更新您的 pom.xml和:

 <dependency>
   <groupId>com.ibm.watson.developer_cloud</groupId>
   <artifactId>java-sdk</artifactId>
   <version>3.3.1</version>
 </dependency>

以下是如何收听麦克风的示例。

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");

// Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
int sampleRate = 16000;
AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

if (!AudioSystem.isLineSupported(info)) {
  System.out.println("Line not supported");
  System.exit(0);
}

TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();

AudioInputStream audio = new AudioInputStream(line);

RecognizeOptions options = new RecognizeOptions.Builder()
  .continuous(true)
  .interimResults(true)
  .timestamps(true)
  .wordConfidence(true)
  //.inactivityTimeout(5) // use this to stop listening when the speaker pauses, i.e. for 5s
  .contentType(HttpMediaType.AUDIO_RAW + "; rate=" + sampleRate)
  .build();

service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
  @Override
  public void onTranscription(SpeechResults speechResults) {
    System.out.println(speechResults);
  }
});

System.out.println("Listening to your voice for the next 30s...");
Thread.sleep(30 * 1000);

// closing the WebSockets underlying InputStream will close the WebSocket itself.
line.stop();
line.close();

System.out.println("Fin.");

关于java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37232560/

文章推荐： java - 确定一周中的哪一天是一个月中的每个日期

文章推荐： java - 为什么 OutOfMemoryError 发生在 -Xmx12m 而不是 -Xmx13m？

文章推荐： java - 为什么 Java 的 Math.round() 不能处理这个数字？

android - SpeechToText synthesizeToFile 不排队
我正在尝试使用 TextToSpeech.synthesizeToFile 方法将一些语音写入 mp3。文本超过 4000 个字符，因此我不得不将其分成 block 。问题是只有最后一段文本最终成为音
java - SpeechToText IBM Watson ExceptionInInitializerError
我在测试 IBM Watson SpeechToText Api 时遇到问题。 SpeechToText service = new SpeechToText(); service
java - Android SpeechToText STT 对话框
有什么方法可以隐藏在android中使用语音转文本时显示的对话框吗？我在某处读到这是不可能的，但肯定有一种方法至少可以显示它然后立即隐藏它？也许有某种方法可以对某些东西进行子类化并改变它的外观？在我
android - SpeechToText 并运行 ACTION_CHECK_TTS_DATA Intent
我已经完全按照 this blog post 中提到的方式实现了 TextToSpeech 集成.在我将它添加到我的程序后，它现在正在干扰我的其他 intent。例如: 列表项用户启动应用用户调
android - 如何为 SpeechRecognizer Google SpeechToText APi 设置土耳其语
在我的应用程序中，我正在尝试创建一个从文本到语音的语音识别器 google Api for turkish 并将 EXTRA_LANGUAGE_PREFERENCE 作为“tr_TR”传递以识别并返回
c# - IBM.WatsonDeveloperCloud.SpeechToText.v1 是否支持 Xamarin Forms 中的跨平台支持
我正在研究语音识别，即跨平台 Xamarin 表单中的语音到文本 API。在进行谷歌搜索时，我发现了 IBM.WatsonDeveloperCloud.SpeechToText.v1。所以我的问题是它
java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务
尝试使用 Java SDK 将来自麦克风的连续音频流直接发送到 IBM Watson SpeechToText Web 服务。随发行版提供的示例之一 ( RecognizeUsingWebSocket

搜寻专家

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务