gpt4 book ai didi

python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳?

转载 作者:行者123 更新时间:2023-12-02 05:46:03 25 4
gpt4 key购买 nike

我使用 OpenAI 的 Whisper用于语音识别的 python 库。如何获取单词级时间戳?


用 OpenAI 的 Whisper 转录(在带有 Nvidia GeForce RTX 3090 的 Ubuntu 20.04 x64 LTS 上测试):

conda create -y --name whisperpy39 python==3.9
conda activate whisperpy39
pip install git+https://github.com/openai/whisper.git
sudo apt update && sudo apt install ffmpeg
whisper recording.wav
whisper recording.wav --model large

如果使用 Nvidia GeForce RTX 3090,请在 conda activate whisperpy39 之后添加以下内容:

pip install -f https://download.pytorch.org/whl/torch_stable.html
conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch

最佳答案

https://openai.com/blog/whisper/只提到“短语级时间戳”,我从中推断,如果不添加更多代码,则无法获得单词级时间戳。

From one of the Whisper authors :

Getting word-level timestamps are not directly supported, but it could be possible using the predicted distribution over the timestamp tokens or the cross-attention weights.

https://github.com/jianfch/stable-ts (麻省理工学院许可证):

This script modifies methods of Whisper's model to gain access to the predicted timestamp tokens of each word without needing addition inference. It also stabilizes the timestamps down to the word level to ensure chronology.

注意:


另一种选择:使用一些 word-level forced alignment program .例如,Lhotse (Apache-2.0 许可证)有 integrated Whisper ASR 和 Wav2vec 强制对齐:

enter image description here

关于python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73822353/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com