gpt4 book ai didi

python - 如何在 Python 中使用 Bing Speech API 转录语音文件?

转载 作者:行者123 更新时间:2023-11-28 19:05:39 28 4
gpt4 key购买 nike

如何在 Python 中使用 Bing Speech API 转录语音文件?我的语音文件超过 15 秒。


我知道有人可以在 Python 中使用 Bing Speech REST API。 https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065在 Python 2 中给出一个例子:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import httplib
import uuid
import json

class Microsoft_ASR():
def __init__(self):
self.sub_key = 'YourKeyHere'
self.token = None
pass

def get_speech_token(self):
FetchTokenURI = "/sts/v1.0/issueToken"
header = {'Ocp-Apim-Subscription-Key': self.sub_key}
conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
body = ""
conn.request("POST", FetchTokenURI, body, header)
response = conn.getresponse()
str_data = response.read()
conn.close()
self.token = str_data
print "Got Token: ", self.token
return True

def transcribe(self,speech_file):

# Grab the token if we need it
if self.token is None:
print "No Token... Getting one"
self.get_speech_token()

endpoint = 'https://speech.platform.bing.com/recognize'
request_id = uuid.uuid4()
# Params form Microsoft Example
params = {'scenarios': 'ulm',
'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
'locale': 'en-US',
'version': '3.0',
'format': 'json',
'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
'requestid': uuid.uuid4(),
'device.os': 'linux'}
content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"

def stream_audio_file(speech_file, chunk_size=1024):
with open(speech_file, 'rb') as f:
while 1:
data = f.read(1024)
if not data:
break
yield data

headers = {'Authorization': 'Bearer ' + self.token,
'Content-Type': content_type}
resp = requests.post(endpoint,
params=params,
data=stream_audio_file(speech_file),
headers=headers)
val = json.loads(resp.text)
return val["results"][0]["name"], val["results"][0]["confidence"]

if __name__ == "__main__":
ms_asr = Microsoft_ASR()
ms_asr.get_speech_token()
text, confidence = ms_asr.transcribe('Your Wav File Here')
print "Text: ", text
print "Confidence: ", confidence

但是,根据 https://learn.microsoft.com/en-us/azure/cognitive-services/speech/home,必应语音 REST API 无法转换超过 15 秒的音频文件:

enter image description here

最佳答案

您可以使用 bing 语音将大型文件转换为 10 分钟的范围,但您需要为其构建一个 websocket,因为它是 bing 中用于大型音频文件的另一种选择。这是 github 仓库 bing speech

关于python - 如何在 Python 中使用 Bing Speech API 转录语音文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47344310/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com