python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳？-6ren

python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳？

转载作者：行者123 更新时间：2023-12-02 22:44:24

45

4

我使用 OpenAI 的 Whisper用于语音识别的 python 库。如何获取单词级时间戳？

用 OpenAI 的 Whisper 转录(在带有 Nvidia GeForce RTX 3090 的 Ubuntu 20.04 x64 LTS 上测试):

conda create -y --name whisperpy39 python==3.9
conda activate whisperpy39
pip install git+https://github.com/openai/whisper.git 
sudo apt update && sudo apt install ffmpeg
whisper recording.wav
whisper recording.wav --model large

如果使用 Nvidia GeForce RTX 3090，请在 conda activate whisperpy39 之后添加以下内容:

pip install -f https://download.pytorch.org/whl/torch_stable.html
conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch

最佳答案

https://openai.com/blog/whisper/只提到“短语级时间戳”，我从中推断，如果不添加更多代码，则无法获得单词级时间戳。

From one of the Whisper authors :

Getting word-level timestamps are not directly supported, but it could be possible using the predicted distribution over the timestamp tokens or the cross-attention weights.

https://github.com/jianfch/stable-ts (麻省理工学院许可证):

This script modifies methods of Whisper's model to gain access to the predicted timestamp tokens of each word without needing addition inference. It also stabilizes the timestamps down to the word level to ensure chronology.

注意:

不清楚这些词级时间戳的精确度。
subtitles sometimes go out of sync .

另一种选择:使用一些 word-level forced alignment program .例如，Lhotse (Apache-2.0 许可证)有 integrated Whisper ASR 和 Wav2vec 强制对齐:

关于python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73822353/

45

4

0

文章推荐： openai-api - OpenAI GPT-3 API : How to extend length of the TL;DR output?

文章推荐： keras - 如何使用 GPT 3 进行文本分类？

文章推荐： node.js - OPENAI API 完成不返回文本

正则表达式在存在多个时提取第一个 date_time 戳
给定一个带有多个 date_time 戳的字符串，我想提取第一个戳及其前面的文本候选字符串可以有一个或多个时间戳后续的 date_time 戳记将被 sep="-" 隔开后续date_time
android - 照片上的文字(日期)戳
是否可以合并从相机拍摄的文本和照片？我想在照片上标记日期和时间，但我在 Google 上找不到任何内容。最佳答案使用下面的代码来实现你所需要的。 Bitmap src = Bitm
facebook - 有没有办法通过 Graph API 戳？
有没有办法通过 Graph API 戳另一个用户？基于this post ，并使用 Graph Explorer ，我发布到“/USERID/pokes”，我已经授予它(Graph API 应用程序和
html - Firefox float 元素需要 DOM 戳
我有两个向左浮动的元素。一个是 body 的第一个 child ，另一个是容器的第一个 child ，容器是 body 的第二个 child 。 ...

首页

博学

6Ren·AI

商城

python - 如何在 OpenAI 的 Whisper ASR 中获取词级时间戳？