python - Python 中的 Azure 认知语音翻译服务的自定义音频输入字节-6ren

python - Python 中的 Azure 认知语音翻译服务的自定义音频输入字节

转载作者：行者123 更新时间：2023-12-03 05:37:41

我需要能够翻译可以从任何来源获得的自定义音频字节，并将语音翻译成我需要的语言(目前是印地语)。我一直在尝试使用 Python 中的以下代码传递自定义音频字节:

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioStreamFormat, PullAudioInputStream, PullAudioInputStreamCallback, AudioConfig, PushAudioInputStream


speech_key, service_region = "key", "region"

channels = 1
bitsPerSample = 16
samplesPerSecond = 16000
audioFormat = AudioStreamFormat(samplesPerSecond, bitsPerSample, channels)

class CustomPullAudioInputStreamCallback(PullAudioInputStreamCallback):

    def __init__(self):
        return super(CustomPullAudioInputStreamCallback, self).__init__()

    def read(self, file_bytes):
        print (len(file_bytes))
        return len(file_bytes)

    def close(self):
        return super(CustomPullAudioInputStreamCallback, self).close()

class CustomPushAudioInputStream(PushAudioInputStream):

    def write(self, file_bytes):
        print (type(file_bytes))
        return super(CustomPushAudioInputStream, self).write(file_bytes)

    def close():
        return super(CustomPushAudioInputStream, self).close()

translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)

fromLanguage = 'en-US'
toLanguage = 'hi'
translation_config.speech_recognition_language = fromLanguage
translation_config.add_target_language(toLanguage)

translation_config.voice_name = "hi-IN-Kalpana-Apollo"


pull_audio_input_stream_callback = CustomPullAudioInputStreamCallback()
# pull_audio_input_stream = PullAudioInputStream(pull_audio_input_stream_callback, audioFormat)
# custom_pull_audio_input_stream = CustomPushAudioInputStream(audioFormat)

audio_config = AudioConfig(use_default_microphone=False, stream=pull_audio_input_stream_callback)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config,
                                                         audio_config=audio_config)


def synthesis_callback(evt):
        size = len(evt.result.audio)
        print('AUDIO SYNTHESIZED: {} byte(s) {}'.format(size, '(COMPLETED)' if size == 0 else ''))
        if size > 0:
            t_sound_file = open("translated_output.wav", "wb+")
            t_sound_file.write(evt.result.audio)
            t_sound_file.close()
        recognizer.stop_continuous_recognition_async()

def recognized_complete(evt):
    if evt.result.reason == speechsdk.ResultReason.TranslatedSpeech:
        print("RECOGNIZED '{}': {}".format(fromLanguage, result.text))
        print("TRANSLATED into {}: {}".format(toLanguage, result.translations['hi']))
    elif evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("RECOGNIZED: {} (text could not be translated)".format(result.text))
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print("NOMATCH: Speech could not be recognized: {}".format(result.no_match_details))
    elif evt.reason == speechsdk.ResultReason.Canceled:
        print("CANCELED: Reason={}".format(result.cancellation_details.reason))
        if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("CANCELED: ErrorDetails={}".format(result.cancellation_details.error_details))

def receiving_bytes(audio_bytes):
    # audio_bytes contain bytes of audio to be translated
    recognizer.synthesizing.connect(synthesis_callback)
    recognizer.recognized.connect(recognized_complete)

    pull_audio_input_stream_callback.read(audio_bytes)
    recognizer.start_continuous_recognition_async()


receiving_bytes(audio_bytes)

输出:错误:AttributeError:“PullAudioInputStreamCallback”对象没有属性“_impl”

软件包及其版本:

Python 3.6.3 azure 认知服务语音 1.11.0

文件翻译可以成功执行，但我不想为收到的每个字节 block 保存文件。

您能否将自定义音频字节传递到 Azure 语音翻译服务并在 Python 中获取结果？如果是的话怎么办？

最佳答案

我自己找到了解决问题的方法。我认为它也适用于 PullAudioInputStream。但它对我使用 PushAudioInputStream 有用。您不需要创建自定义类，它的工作方式如下:

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioStreamFormat, PullAudioInputStream, PullAudioInputStreamCallback, AudioConfig, PushAudioInputStream

from threading import Thread, Event


speech_key, service_region = "key", "region"

channels = 1
bitsPerSample = 16
samplesPerSecond = 16000
audioFormat = AudioStreamFormat(samplesPerSecond, bitsPerSample, channels)

translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)

fromLanguage = 'en-US'
toLanguage = 'hi'
translation_config.speech_recognition_language = fromLanguage
translation_config.add_target_language(toLanguage)

translation_config.voice_name = "hi-IN-Kalpana-Apollo"

# Remove Custom classes as they are not needed.

custom_push_stream = speechsdk.audio.PushAudioInputStream(stream_format=audioFormat)

audio_config = AudioConfig(stream=custom_push_stream)

recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config, audio_config=audio_config)

# Create an event
synthesis_done = Event()

def synthesis_callback(evt):
        size = len(evt.result.audio)
        print('AUDIO SYNTHESIZED: {} byte(s) {}'.format(size, '(COMPLETED)' if size == 0 else ''))
        if size > 0:
            t_sound_file = open("translated_output.wav", "wb+")
            t_sound_file.write(evt.result.audio)
            t_sound_file.close()
        # Setting the event
        synthesis_done.set()

def recognized_complete(evt):
    if evt.result.reason == speechsdk.ResultReason.TranslatedSpeech:
        print("RECOGNIZED '{}': {}".format(fromLanguage, result.text))
        print("TRANSLATED into {}: {}".format(toLanguage, result.translations['hi']))
    elif evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("RECOGNIZED: {} (text could not be translated)".format(result.text))
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print("NOMATCH: Speech could not be recognized: {}".format(result.no_match_details))
    elif evt.reason == speechsdk.ResultReason.Canceled:
        print("CANCELED: Reason={}".format(result.cancellation_details.reason))
        if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("CANCELED: ErrorDetails={}".format(result.cancellation_details.error_details))


recognizer.synthesizing.connect(synthesis_callback)
recognizer.recognized.connect(recognized_complete)

# Read and get data from an audio file
open_audio_file = open("speech_wav_audio.wav", 'rb')
file_bytes = open_audio_file.read()

# Write the bytes to the stream
custom_push_stream.write(file_bytes)
custom_push_stream.close()

# Start the recognition
recognizer.start_continuous_recognition()

# Waiting for the event to complete
synthesis_done.wait()

# Once the event gets completed you can call Stop recognition
recognizer.stop_continuous_recognition()

自从 start_continuous_recognition 在不同的线程中启动以来，我一直使用来自线程的事件，如果不使用线程，您将无法从回调事件中获取数据。 synthesis_done.wait 将通过等待事件完成来解决此问题，然后才会调用 stop_continuous_recognition。获得音频字节后，您可以在 synthesis_callback 中执行任何您想要的操作。我简化了示例并从 wav 文件中获取了字节。

关于python - Python 中的 Azure 认知语音翻译服务的自定义音频输入字节，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61300431/

文章推荐：本地开发时的 Azure Functions 成本

文章推荐： Azure Webapp - Azure .Net SDK - 设置站点 IP 限制

【翻译】rocksdbwritestall
翻译自官方wiki： https://github.com/facebook/rocksdb/wiki/Write-Stalls 转载请注明出处： https://www.cnblogs.c
翻译：REST和gRPC详细比较
译者注：在微服务架构设计，构建API和服务间通信技术选型时，对 REST 和 gRPC 的理解和应用还存在知识盲区，近期看到国外的这篇文章： A detailed comparison of
【翻译】rocksdb调试指引
rocksdb调试指引翻译自官方wiki: https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide 转载请注明出处： h
浅谈MVC框架的优点(翻译)
传统的ASP.NET Web Forms是一个非常好的主意，但现实需求非常复杂。随着时间的推移，现实世界的项目暴露出Web Forms的一些不足之处： “沉重的”视图状态：现实中在http请求之间
十个最常见的Java字符串问题(翻译)
翻译自：Top 10 questions of Java Strings 简单地说，”==”测试两个字符串的引用是否相同，equals()测试两个字符串的值是否相同。除非你希望检
理解Java当中的回调机制(翻译)
你好，今天我要和大家分享一些东西，举例来说这个在JavaScript中用的很多。我要讲讲回调（callbacks）。你知道什么时候用，怎么用这个吗？你真的理解了它在java环境中的用法了吗？当我也问
JAVA多线程和并发基础面试问答(翻译)
　Java多线程面试问题　　1. 进程和线程之间有什么不同？　　一个进程是一个独立(self contained)的运行环境，它可以被看作一个程序或者一个应用。而线程是在进程中执行的一个
[翻译].NET8的原生AOT及高性能Web开发中的应用[附性能测试结果]
原文: [A Dive into .Net 8 Native AOT and Efficient Web Development] 作者: [sharmila subbiah] 引言随着 .NE
angularjs - Angular 翻译
这是Fiddle 是否可以在 angular-translate 中检查其他语言的键值是否可用，然后它可以从其他语言中提取该键值？就像在示例中，我有英语和西类牙语。并且一个键值(例如“CONFIRM
外部脚本中的 Magento 翻译
我希望能够使用 $this->__('String to translate')在外部脚本中。我该怎么做呢？ Magento 版本 1.5.1.0 . 最佳答案我认为设置语言环境的正确方法是: Ma
自定义属性的 Angular 翻译
我有一个开关小部件，它使用自定义数据属性值来标记自己。 .switch.switch-text .switch-label::before { right: 1px; color: #c2cf
java - 翻译 LOC
是否有人遇到过这样的情况:用 Java 编写并由(例如)法国程序员编写的现有代码库必须转换为英语程序员可以理解的代码？这里的问题是变量/方法/类名称、注释等都将采用该特定语言。现在有可用的自动化解决
java - java解释器在逐行执行代码之前是否执行转换/翻译？
维基百科和其他一些网站将解释器描述为将代码从某种高级语言翻译成某种低级语言的翻译器。然而，有很多解释，包括在 stackoverflow 中，它说解释器直接执行作为输入的指令，而无需事先转换。那么解释
image - 通过在自定义单元格内进行修饰来制作基本的动画/翻译
我想将基本动画应用于自定义单元格中的某些元素，例如标签、图像:特别是，我想让这些动画在我触摸单元格内部时也启动。我是初学者，我只学会了使用 animateWithDuration 和 transiti
ios - DateComponentFormatter 翻译
这个问题在这里已经有了答案: NSDateFormatter and current language in iOS11 (5 个回答) 已关闭 3 年前。当使用这样的 DateComponentF
javascript - 在这种情况下如何转换、翻译？
我想在点击 var about 时移动 div.willshow。但我单击那个 btn，只有它获得类 active。然后我再次单击那个 btn 它失去了类。如果我再点击一次，每项任务都无法正常工作。
CSS 翻译 - 意外行为
我想要一个按钮在悬停时向下移动几个像素，但它又回来了。当您还在上面徘徊时，它不应该留在原处吗？ Email Me .btn {background: #2ecc71; padding: .5em 1e
javascript - Angular 翻译
在我的应用程序中，我想添加功能将页面翻译为用户在浏览器中设置的所有语言，如果没有可用的语言，则翻译为默认英语...问题是浏览器与语言支持不一致。我找到了一个解决方法，我对一些返回用户语言的 Web 服
html - 谷歌翻译，翻译 ="no"
我的应用程序有一个 Help.htm 文件，用谷歌翻译翻译得相当好。我想将菜单项标记为“请勿翻译”，但我发现并尝试过的 HTML 标签都不起作用。对于以下内容，我使用了谷歌翻译网站 - 它翻译了我没想
CSS3 翻译() 方法
我有以下代码: span { width:200px; height:100px; background-color:red; border:1px solid black; } span.c2 {

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Python 中的 Azure 认知语音翻译服务的自定义音频输入字节