gpt4 book ai didi

java - 自定义 TextToSpeechService 中的错误突出显示

转载 作者:行者123 更新时间:2023-12-01 18:00:40 26 4
gpt4 key购买 nike

我从 Android API 扩展了 TextToSpeechService 来制作我自己的自定义 TTS 服务。 TTS 服务从 TTS 服务器获取信息。服务器为我提供了一个音频缓冲区,其中包含一些列表,其中包含亮点的开始位置及其时间。问题是我的高亮显示太早地出现在 TTS 引擎即将说出的下一个单词上。我似乎找不到导致此问题的原因。我认为 audioPositionMillis 可能是错误的,但据我所知计算是正确的。我认为 audioPositionMillis 快了大约 700 毫秒。我忽略了一些小事情

   @Override
protected synchronized void onSynthesizeText(SynthesisRequest request, SynthesisCallback callback) {

// Note that we call onLoadLanguage here since there is no guarantee
// that there was a prior call to this function.
int load = onLoadLanguage(request.getLanguage(), request.getCountry(), request.getVariant());

// We might get requests for a language we don't support - in which case
// we error out early before wasting too much time.
if (load == TextToSpeech.LANG_NOT_SUPPORTED) {
callback.error();
return;
}

String ttsText = request.getCharSequenceText().toString();
final int speechRate = mapSpeechRate(request.getSpeechRate());
TtsParams ttsParams = new TtsParams(ttsText, currentVoice, speechRate, VOLUME,
TIME_BETWEEN_SENTENCES_MILLIS, BIT_RATE, TtsParams.Format.WAV);

try {
TtsInfo data = null;
Response<TtsInfo> response = serviceManager.getTtsInfo(ttsParams); //Synchronous call because methods executed on the synthesisCallback need to be called on the synth thread.
if(response != null){
data = response.body();
}

if(data == null){
callback.error();
return;
}

//Response does not make any sense to me, we modify its data
List<Integer> wordPositionsMs = data.getAudioPos();
List<Integer> wordStartPositions = data.getCharPos();
List<Integer> wordLengths = data.getCharCount();

wordStartPositions.add(0, 0);
wordStartPositions.remove(wordStartPositions.size() - 1);

wordPositionsMs.add(0, 102); //First word always starts at 102ms according to the docs
wordPositionsMs.remove(wordStartPositions.size() - 1);

callback.start(SAMPLING_RATE_HZ, AudioFormat.ENCODING_PCM_16BIT, CHANNEL_COUNT);
int maxBufferSize = callback.getMaxBufferSize();
byte[] audioBuffer = Base64.decode(data.getByteArray(), Base64.DEFAULT);
int offset = 0;
while (offset < audioBuffer.length) {
int bytesToWrite = Math.min(maxBufferSize, audioBuffer.length - offset);
if(callback.audioAvailable(audioBuffer, offset, bytesToWrite) != TextToSpeech.SUCCESS){
callback.error();
return;
}

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
long audioPositionMillis = Math.round(offset / ((SAMPLING_RATE_HZ/1000D) * CHANNEL_COUNT * (BIT_DEPTH/8D)));
int wordIndex = -1;
for (int i = 0; i < wordPositionsMs.size(); i++) {
if (audioPositionMillis > wordPositionsMs.get(i)) {
wordIndex++;
} else {
break;
}
}

if (wordIndex > -1) {
int wordStart = wordStartPositions.get(wordIndex);
int wordLength = wordLengths.get(wordIndex);
callback.rangeStart(-1, wordStart, wordStart + wordLength);
}
}

offset += bytesToWrite;
}
callback.done();
} catch (IOException | NoNetworkException e) {
e.printStackTrace();
callback.error();
}
}

最佳答案

我将-1作为markerInFrames参数传递给rangeStart回调方法,这导致了这个问题。

解决方案:

callback.rangeStart((int)(offset/(BIT_DEPTH/8D)), wordStart, wordStart + wordLength);

关于java - 自定义 TextToSpeechService 中的错误突出显示,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60638540/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com