gpt4 book ai didi

python - Google 语音服务 - 不返回转录内容

转载 作者:太空宇宙 更新时间:2023-11-03 15:28:59 25 4
gpt4 key购买 nike

我正在使用sample code provided here并已实现以下内容:

# [START import_libraries]
import argparse
import base64
import json
import time
from oauth2client.service_account import ServiceAccountCredentials
import googleapiclient.discovery
import googleapiclient as gac
# [END import_libraries]


# [START authenticating]


# Application default credentials provided by env variable
# GOOGLE_APPLICATION_CREDENTIALS
def get_speech_service(credentials):
return googleapiclient.discovery.build('speech', 'v1beta1',credentials = credentials)



def main(speech_file):
"""Transcribe the given audio file asynchronously.
Args:
speech_file: the name of the audio file.
"""
# [START construct_request]
with open(speech_file, 'rb') as speech:
# Base64 encode the binary audio file for inclusion in the request.
speech_content = base64.b64encode(speech.read())

# print speech_content

scopes = ['https://www.googleapis.com/auth/cloud-platform']

credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/Users/user/Documents/google_cloud/myjson.json', scopes)

service = get_speech_service(credentials)
service_request = service.speech().asyncrecognize(
body={
'config': {
# There are a bunch of config options you can specify. See
# https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig for the full list.
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 16000, # 16 khz
# See http://g.co/cloud/speech/docs/languages for a list of
# supported languages.
'languageCode': 'en-US', # a BCP-47 language tag
},
'audio': {
'content': speech_content.decode('UTF-8')
}
})
# [END construct_request]
# [START send_request]
response = service_request.execute()
print(json.dumps(response))
# [END send_request]

name = response['name']
# Construct a GetOperation request.
service_request = service.operations().get(name=name)

while True:
# Give the server a few seconds to process.
print('Waiting for server processing...')
time.sleep(1)
# Get the long running operation with response.
response = service_request.execute()

if 'done' in response and response['done']:
break

# First print the raw json response
print(json.dumps(response['response'], indent=2))

# Now print the actual transcriptions
out = []
for result in response['response'].get('results', []):
print 'poo'
print('Result:')
for alternative in result['alternatives']:
print(u' Alternative: {}'.format(alternative['transcript']))
out.append(result)
return response

r = main("/Users/user/Downloads/brooklyn.flac")

但我的打印内容如下:

{"name": "3202776140236290963"}
Waiting for server processing...
Waiting for server processing...
{
"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
}

我返回的对象是:

{u'done': True,
u'metadata': {u'@type': u'type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata',
u'lastUpdateTime': u'2017-03-25T15:54:46.136925Z',
u'progressPercent': 100,
u'startTime': u'2017-03-25T15:54:44.514614Z'},
u'name': u'2024312474309214820',
u'response': {u'@type': u'type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse'}}

在我的控制台屏幕上,我看到通过以下方式发送的请求: enter image description here

不确定为什么我没有从示例文件中得到正确的转录。

欢迎任何意见!

最佳答案

您的配置选项如下:

        'config': {
# There are a bunch of config options you can specify. See
# https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig for the full list.
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 16000, # 16 khz
# See http://g.co/cloud/speech/docs/languages for a list of
# supported languages.
'languageCode': 'en-US', # a BCP-47 language tag
},

但是,您使用的是 FLAC 文件:

r = main("/Users/user/Downloads/brooklyn.flac")

引用https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig :

LINEAR16

Uncompressed 16-bit signed little-endian samples (Linear PCM). This is the only encoding that may be used by speech.asyncrecognize.

FLAC

This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec.

换句话说,您不能将 FLACspeech.asyncrecognize 结合使用,您可能需要先将样本转码为 Linear PCM,或者使用 speech。使用 FLAC 编码选项同步识别

关于python - Google 语音服务 - 不返回转录内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43018656/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com