gpt4 book ai didi

python - Azure 发音评估 SDK 与 api 调用相比返回错误结果

转载 作者:行者123 更新时间:2023-12-03 02:02:43 24 4
gpt4 key购买 nike

我正在使用azure语音sdk进行发音评估,当我使用azure提供的api时它工作正常,但是当我使用语音sdk时结果不正确。我遵循 cognitive services speech sdk 中的示例

这是我用于sdk的代码

    def speech_recognition_with_pull_stream(self):
class WavFileReaderCallback(speechsdk.audio.PullAudioInputStreamCallback):
def __init__(self, filename: str):
super().__init__()
self._file_h = wave.open(filename, mode=None)

self.sample_width = self._file_h.getsampwidth()

assert self._file_h.getnchannels() == 1
assert self._file_h.getsampwidth() == 2
# assert self._file_h.getframerate() == 16000 #comment this line because every .wav file read is 48000
assert self._file_h.getcomptype() == 'NONE'

def read(self, buffer: memoryview) -> int:
size = buffer.nbytes
print(size)
print(len(buffer))
frames = self._file_h.readframes(len(buffer) // self.sample_width)

buffer[:len(frames)] = frames

return len(frames)

def close(self):
self._file_h.close()

speech_key = os.getenv('AZURE_SUBSCRIPTION_KEY')
service_region = os.getenv('AZURE_REGION')
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# specify the audio format
wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second=16000, bits_per_sample=16, channels=1)

# setup the audio stream
callback = WavFileReaderCallback('/Users/146072/Downloads/58638f26-ed07-40b7-8672-1948c814bd69.wav')
stream = speechsdk.audio.PullAudioInputStream(callback, wave_format)
audio_config = speechsdk.audio.AudioConfig(stream=stream)

# instantiate the speech recognizer with pull stream input
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config, language='en-US')

reference_text = 'We had a great time taking a long walk outside in the morning'
pronunciation_assessment_config = speechsdk.PronunciationAssessmentConfig(
reference_text=reference_text,
grading_system=PronunciationAssessmentGradingSystem.HundredMark,
granularity=PronunciationAssessmentGranularity.Word,
)
pronunciation_assessment_config.phoneme_alphabet = "IPA"
pronunciation_assessment_config.apply_to(speech_recognizer)
speech_recognition_result = speech_recognizer.recognize_once()
print(speech_recognition_result.text)

# The pronunciation assessment result as a Speech SDK object
pronunciation_assessment_result = speechsdk.PronunciationAssessmentResult(speech_recognition_result)
print(pronunciation_assessment_result)

# The pronunciation assessment result as a JSON string
pronunciation_assessment_result_json = speech_recognition_result.properties.get(
speechsdk.PropertyId.SpeechServiceResponse_JsonResult
)
print(pronunciation_assessment_result_json)

return json.loads(pronunciation_assessment_result_json)

这是 sdk 的结果

"PronunciationAssessment": {
"AccuracyScore": 26,
"FluencyScore": 9,
"CompletenessScore": 46,
"PronScore": 19.8
},

这里是api调用的代码

    def ackaud(self):
# f.save(audio)
# print('file uploaded successfully')

# a generator which reads audio data chunk by chunk
# the audio_source can be any audio input stream which provides read() method, e.g. audio file, microphone, memory stream, etc.
def get_chunk(audio_source, chunk_size=1024):
while True:
# time.sleep(chunk_size / 32000) # to simulate human speaking rate
chunk = audio_source.read(chunk_size)
if not chunk:
# global uploadFinishTime
# uploadFinishTime = time.time()
break
yield chunk

# build pronunciation assessment parameters
referenceText = 'We had a great time taking a long walk outside in the morning. '

pronAssessmentParamsJson = "{\"ReferenceText\":\"%s\",\"GradingSystem\":\"HundredMark\",\"Dimension\":\"Comprehensive\",\"EnableMiscue\":\"True\"}" % referenceText
pronAssessmentParamsBase64 = base64.b64encode(bytes(pronAssessmentParamsJson, 'utf-8'))
pronAssessmentParams = str(pronAssessmentParamsBase64, "utf-8")

subscription_key = os.getenv('AZURE_SUBSCRIPTION_KEY')
region = os.getenv('AZURE_REGION')

# build request
url = "https://%s.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=%s&usePipelineVersion=0" % (
region, 'en-US')
headers = {'Accept': 'application/json;text/xml',
'Connection': 'Keep-Alive',
'Content-Type': 'audio/wav; codecs=audio/pcm; samplerate=16000',
'Ocp-Apim-Subscription-Key': subscription_key,
'Pronunciation-Assessment': pronAssessmentParams,
'Transfer-Encoding': 'chunked',
'Expect': '100-continue'}

audioFile = open('/Users/146072/Downloads/58638f26-ed07-40b7-8672-1948c814bd69.wav', 'rb')
# audioFile = f
# send request with chunked data
response = requests.post(url=url, data=get_chunk(audioFile), headers=headers)
# getResponseTime = time.time()
audioFile.close()

# latency = getResponseTime - uploadFinishTime
# print("Latency = %sms" % int(latency * 1000))

return response.json()

这是 api 的结果

"AccuracyScore": 100,
"FluencyScore": 100,
"CompletenessScore": 100,
"PronScore": 100,

我的设置有什么问题吗?非常感谢。

最佳答案

安装最新的语音 SDK 1.26.0,因为 REST API 使用普遍可用的版本 3.1。

这是document安装语音SDK。

关于python - Azure 发音评估 SDK 与 api 调用相比返回错误结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75606993/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com