gpt4 book ai didi

python - 通过 Google Cloud Speech API 获取每个转录单词的时间戳?

转载 作者:行者123 更新时间:2023-11-30 22:18:17 25 4
gpt4 key购买 nike

我希望通过 Google Cloud Speech API 转录音频文件。这个简单的脚本采用 wav 作为输入,并以相当高的准确度转录它。

import os
import sys
import speech_recognition as sr

with open("~/Documents/speech-to-text/speech2textgoogleapi.json") as f:
GOOGLE_CLOUD_SPEECH_CREDENTIALS = f.read()
name = sys.argv[1] # wav file
r = sr.Recognizer()
all_text = []
with sr.AudioFile(name) as source:
audio = r.record(source)
# Transcribe audio file
text = r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)
all_text.append(text)
with open("~/Documents/speech-to-text/transcript.txt", "w") as f:
f.write(str(all_text))

如何使用 API 从语音音频中提取其他有意义的信息?具体来说,我希望获得每个单词的时间戳,但其他信息(例如音调、幅度、说话人识别等)将非常受欢迎。提前致谢!

最佳答案

实际上有一个关于如何在 Speech API 中执行此操作的示例

Using Time offsets(TimeStamps) :

Time offset (timestamp) values can be included in the response text for your recognize request. Time offset values show the beginning and end of each spoken word that is recognized in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

Time offsets are especially useful for analyzing longer audio files, where you may need to search for a particular word in the recognized text and locate it (seek) in the original audio. Time offsets are supported for all our recognition methods: recognize, streamingrecognize, and longrunningrecognize. See below for an example of longrunningrecognize.....

这是 Python 的代码示例:

def transcribe_gcs_with_word_time_offsets(gcs_uri):
"""Transcribe the given audio file asynchronously and output the word time
offsets."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()

audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US',
enable_word_time_offsets=True)

operation = client.long_running_recognize(config, audio)

print('Waiting for operation to complete...')
result = operation.result(timeout=90)

for result in result.results:
alternative = result.alternatives[0]
print('Transcript: {}'.format(alternative.transcript))
print('Confidence: {}'.format(alternative.confidence))

for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
print('Word: {}, start_time: {}, end_time: {}'.format(
word,
start_time.seconds + start_time.nanos * 1e-9,
end_time.seconds + end_time.nanos * 1e-9))

希望这有帮助。

关于python - 通过 Google Cloud Speech API 获取每个转录单词的时间戳?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49415038/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com