- Java 双重比较
- java - 比较器与 Apache BeanComparator
- Objective-C 完成 block 导致额外的方法调用?
- database - RESTful URI 是否应该公开数据库主键?
尝试使用 Java SDK 将来自麦克风的连续音频流直接发送到 IBM Watson SpeechToText Web 服务。随发行版提供的示例之一 ( RecognizeUsingWebSocketsExample
) 显示了如何将 .WAV 格式的文件流式传输到服务。但是,.WAV 文件要求提前指定它们的长度,因此一次只向文件附加一个缓冲区的幼稚方法是不可行的。
看来SpeechToText.recognizeUsingWebSocket
可以接受一个流,但是给它一个 AudioInputStream
的实例似乎没有这样做,似乎连接已建立,但即使 RecognizeOptions.interimResults(true)
也没有返回任何成绩单.
public class RecognizeUsingWebSocketsExample {
private static CountDownLatch lock = new CountDownLatch(1);
public static void main(String[] args) throws FileNotFoundException, InterruptedException {
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");
AudioInputStream audio = null;
try {
final AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
TargetDataLine line;
line = (TargetDataLine)AudioSystem.getLine(info);
line.open(format);
line.start();
audio = new AudioInputStream(line);
} catch (LineUnavailableException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
RecognizeOptions options = new RecognizeOptions.Builder()
.continuous(true)
.interimResults(true)
.contentType(HttpMediaType.AUDIO_WAV)
.build();
service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println(speechResults);
if (speechResults.isFinal())
lock.countDown();
}
});
lock.await(1, TimeUnit.MINUTES);
}
}
WAV_audioInputStream = new AudioInputStream(line);
FileInputStream FLAC_audioInputStream = new FileInputStream(tempFile);
StreamConfiguration streamConfiguration = new StreamConfiguration();
streamConfiguration.setSampleRate(16000);
streamConfiguration.setBitsPerSample(8);
streamConfiguration.setChannelCount(1);
flacEncoder = new FLACEncoder();
flacOutputStream = new FLACFileOutputStream(tempFile); // write to temp disk file
flacEncoder.setStreamConfiguration(streamConfiguration);
flacEncoder.setOutputStream(flacOutputStream);
flacEncoder.openFLACStream();
...
// convert data
int frameLength = 16000;
int[] intBuffer = new int[frameLength];
byte[] byteBuffer = new byte[frameLength];
while (true) {
int count = WAV_audioInputStream.read(byteBuffer, 0, frameLength);
for (int j1=0;j1<count;j1++)
intBuffer[j1] = byteBuffer[j1];
flacEncoder.addSamples(intBuffer, count);
flacEncoder.encodeSamples(count, false); // 'false' means non-final frame
}
flacEncoder.encodeSamples(flacEncoder.samplesAvailableToEncode(), true); // final frame
WAV_audioInputStream.close();
flacOutputStream.close();
FLAC_audioInputStream.close();
curl
或
recognizeUsingWebSocket()
)。然而,
recognizeUsingWebSocket()
即使文件的最后一帧可能不是最终的(即在
encodeSamples(count, false)
之后),它也会在到达 FLAC 文件的末尾时立即返回最终结果。
recognizeUsingWebSocket()
阻塞直到最后一帧写入文件。实际上,这意味着分析在第一帧之后停止,因为分析第一帧比收集第二帧花费的时间更少,因此在返回结果时,到达文件末尾。
RecognizeUsingWebSocketsExample
的修改,在下面结合了 Daniel 的一些建议。它使用 PCM 内容类型(作为
String
与帧大小一起传递),并尝试发出音频流结束的信号,尽管不是很成功。
public static void main(String[] args) throws IOException, LineUnavailableException, InterruptedException {
final PipedOutputStream output = new PipedOutputStream();
final PipedInputStream input = new PipedInputStream(output);
final AudioFormat format = new AudioFormat(16000, 8, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine line = (TargetDataLine)AudioSystem.getLine(info);
line.open(format);
line.start();
Thread thread1 = new Thread(new Runnable() {
@Override
public void run() {
try {
final int MAX_FRAMES = 2;
byte buffer[] = new byte[16000];
for(int j1=0;j1<MAX_FRAMES;j1++) { // read two frames from microphone
int count = line.read(buffer, 0, buffer.length);
System.out.println("Read audio frame from line: " + count);
output.write(buffer, 0, buffer.length);
System.out.println("Written audio frame to pipe: " + count);
}
/** no need to fake end-of-audio; StopMessage will be sent
* automatically by SDK once the pipe is drained (see WebSocketManager)
// signal end of audio; based on WebSocketUploader.stop() source
byte[] stopData = new byte[0];
output.write(stopData);
**/
} catch (IOException e) {
}
}
});
thread1.start();
final CountDownLatch lock = new CountDownLatch(1);
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");
RecognizeOptions options = new RecognizeOptions.Builder()
.continuous(true)
.interimResults(false)
.contentType("audio/pcm; rate=16000")
.build();
service.recognizeUsingWebSocket(input, options, new BaseRecognizeCallback() {
@Override
public void onConnected() {
System.out.println("Connected.");
}
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println("Received results.");
System.out.println(speechResults);
if (speechResults.isFinal())
lock.countDown();
}
});
System.out.println("Waiting for STT callback ... ");
lock.await(5, TimeUnit.SECONDS);
line.stop();
System.out.println("Done waiting for STT callback.");
}
WebSocketManager
的来源(带有 SDK)并替换了对
sendMessage()
的调用带有明确的
StopMessage
有效载荷如下:
/**
* Send input steam.
*
* @param inputStream the input stream
* @throws IOException Signals that an I/O exception has occurred.
*/
private void sendInputSteam(InputStream inputStream) throws IOException {
int cumulative = 0;
byte[] buffer = new byte[FOUR_KB];
int read;
while ((read = inputStream.read(buffer)) > 0) {
cumulative += read;
if (read == FOUR_KB) {
socket.sendMessage(RequestBody.create(WebSocket.BINARY, buffer));
} else {
System.out.println("completed sending " + cumulative/16000 + " frames over socket");
socket.sendMessage(RequestBody.create(WebSocket.BINARY, Arrays.copyOfRange(buffer, 0, read))); // partial buffer write
System.out.println("signaling end of audio");
socket.sendMessage(RequestBody.create(WebSocket.TEXT, buildStopMessage().toString())); // end of audio signal
}
}
inputStream.close();
}
Waiting for STT callback ...
Connected.
Read audio frame from line: 16000
Written audio frame to pipe: 16000
Read audio frame from line: 16000
Written audio frame to pipe: 16000
completed sending 2 frames over socket
onFailure: java.net.SocketException: Software caused connection abort: socket write error
WebSocketManager
源代码,
onMessage()
已发送
StopMessage
立即
return
来自
sendInputSteam()
(即,当音频流或上面示例中的管道耗尽时),因此无需显式调用它。问题肯定发生在音频数据传输完成之前。行为是相同的,无论
PipedInputStream
或
AudioInputStream
作为输入传递。在这两种情况下发送二进制数据时都会抛出异常。
最佳答案
Java SDK 有一个示例并支持这一点。
更新您的 pom.xml
和:
<dependency>
<groupId>com.ibm.watson.developer_cloud</groupId>
<artifactId>java-sdk</artifactId>
<version>3.3.1</version>
</dependency>
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");
// Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
int sampleRate = 16000;
AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
if (!AudioSystem.isLineSupported(info)) {
System.out.println("Line not supported");
System.exit(0);
}
TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();
AudioInputStream audio = new AudioInputStream(line);
RecognizeOptions options = new RecognizeOptions.Builder()
.continuous(true)
.interimResults(true)
.timestamps(true)
.wordConfidence(true)
//.inactivityTimeout(5) // use this to stop listening when the speaker pauses, i.e. for 5s
.contentType(HttpMediaType.AUDIO_RAW + "; rate=" + sampleRate)
.build();
service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println(speechResults);
}
});
System.out.println("Listening to your voice for the next 30s...");
Thread.sleep(30 * 1000);
// closing the WebSockets underlying InputStream will close the WebSocket itself.
line.stop();
line.close();
System.out.println("Fin.");
关于java - 使用 Java SDK 将音频从麦克风流式传输到 IBM Watson SpeechToText Web 服务,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37232560/
我正在尝试使用 TextToSpeech.synthesizeToFile 方法将一些语音写入 mp3。文本超过 4000 个字符,因此我不得不将其分成 block 。问题是只有最后一段文本最终成为音
我在测试 IBM Watson SpeechToText Api 时遇到问题。 SpeechToText service = new SpeechToText(); service
有什么方法可以隐藏在android中使用语音转文本时显示的对话框吗?我在某处读到这是不可能的,但肯定有一种方法至少可以显示它然后立即隐藏它? 也许有某种方法可以对某些东西进行子类化并改变它的外观?在我
我已经完全按照 this blog post 中提到的方式实现了 TextToSpeech 集成.在我将它添加到我的程序后,它现在正在干扰我的其他 intent。 例如: 列表项 用户启动应用 用户调
在我的应用程序中,我正在尝试创建一个从文本到语音的语音识别器 google Api for turkish 并将 EXTRA_LANGUAGE_PREFERENCE 作为“tr_TR”传递以识别并返回
我正在研究语音识别,即跨平台 Xamarin 表单中的语音到文本 API。在进行谷歌搜索时,我发现了 IBM.WatsonDeveloperCloud.SpeechToText.v1。所以我的问题是它
尝试使用 Java SDK 将来自麦克风的连续音频流直接发送到 IBM Watson SpeechToText Web 服务。随发行版提供的示例之一 ( RecognizeUsingWebSocket
我是一名优秀的程序员,十分优秀!