gpt4 book ai didi

python - 添加 ssml 字符串后,我的 Azure 文本转语音应用程序不再输出

转载 作者:行者123 更新时间:2023-12-03 03:23:51 28 4
gpt4 key购买 nike

我使用 Microsoft Azure 创建了一个文本转语音脚本。今天我决定添加一个音调更改器(mutator)、语速更改器(mutator)以及可能的一些静音添加。为此,我需要将spoke_text_async(text)替换为speak_ssml_async(ssml_string)。自从我这样做之后,TTS 就停止播放并且没有生成 .wav 文件。我所做的只是添加恒定的 50% 音调来测试它,ssml_string 并将合成器更改为 ssml 而不是文本(否则它只会读取 ssml 中的 html 行。

我只是将speak_ssml_async更改回speak_text_async,但保留了(ssml_string)以确认问题来自ssml_string,但我无法弄清楚它是什么,因为我没有收到错误。

我将把代码的相关部分留在这里。请记住,我有自定义输出文件名和目录选择器,以及在此定义之前的 tts 文本输入的输入标签。

        #Directory selector
output_label = ttk.Label(self, text="Choose your output folder:",
font=platformfont,
style="Output.TLabel")
output_label.pack(pady=2)
self.output_dir_button = ttk.Button(self, text="Browse", command=self.choose_output_dir,
takefocus=False,
style="Custom.TButton")
self.output_dir_button.pack()
self.output_dir_path = tk.StringVar()
self.output_dir_path.set("")
self.output_dir_entry = tk.Entry(self, textvariable=self.output_dir_path, font=inputfont,
width=55,
foreground="#395578",
state='readonly',
background="light gray",
readonlybackground="#Eed9c9",
borderwidth=0,
cursor="X_cursor",
relief="flat")
self.output_dir_entry.pack(pady=5)

#Output filename
output_filename_label = ttk.Label(self, text="Enter output filename (without extension):",
font=platformfont,
style="Output.TLabel")
output_filename_label.pack(pady=5)

#Listen button
speak_button = ttk.Button(self, text="Listen & Generate", command=self.speak_text,
takefocus=False,
style="Custom.TButton")
speak_button.pack(pady=15)

def choose_output_dir(self):
dir_path = filedialog.askdirectory()
if dir_path:
self.output_dir_path.set(dir_path)

def speak_text(self):
text = self.input_text.get("1.0", "end")
output_dir = self.output_dir_path.get()
output_filename = self.output_filename_text.get()
if output_filename == "":
output_filename = "tcnoutput"
output_file = os.path.join(output_dir, output_filename + ".wav")

if os.path.exists(output_file):
response = messagebox.askyesnocancel("File Exists", "A file with the same name already exists. Do you want to overwrite it?",
icon='warning')
if response == True:
os.remove(output_file)
elif response == False:
i = 1
while os.path.exists(os.path.join(output_dir, output_filename + f"({i})" + ".wav")):
i += 1
output_filename = output_filename + f"({i})"
output_file = os.path.join(output_dir, output_filename + ".wav")
else:
raise KeyboardInterrupt

pitch = "+50.0%"
ssml_string = f"<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='ro-RO'>" \
f"<prosody pitch='{pitch}'>{text}</prosody></speak>"

speech_synthesis_result = self.speech_synthesizer.speak_ssml_async(ssml_string).get()
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
with open(output_file, "wb") as f:
f.write(speech_synthesis_result.audio_data)
if output_dir == "":
output_final = os.getcwd() + "\\" + output_filename + ".wav"
else:
output_final = output_dir + "/" + output_filename + ".wav"
messagebox.showinfo("Success", f"Audio file successfully saved at: {output_final}")
else:
messagebox.showerror("Error", "Speech synthesis failed.")

最佳答案

我尝试使用下面的 python 代码将文本配置为语音并使用 SSML 配置语音设置,并获得了所需的音频输出,如下所示:-

代码:

import  azure.cognitiveservices.speech  as  speechsdk
import io
import wave
speech_config = speechsdk.SpeechConfig(subscription="key", region="region")
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
ssml_string = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-JennyNeural'><prosody pitch='+50%'>Hello, my friend! How are you?</prosody></voice></speak>"
result = synthesizer.speak_ssml_async(ssml_string).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("SSML string is correct")
else:
print("SSML string is incorrect: {}".format(result.errorDetails))
with io.BytesIO(result.audio_data) as compressedAudioStream:
with wave.open("test.wav", "wb") as wavFile:
wavFile.setnchannels(1)
wavFile.setsampwidth(2)
wavFile.setframerate(16000)
wavFile.writeframes(compressedAudioStream.read())

输出:
输入文本的音频在wav文件中生成,
enter image description here

  • Refer these 2 MS documents to configure SSML code Doc1 &Doc2

关于python - 添加 ssml 字符串后,我的 Azure 文本转语音应用程序不再输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76108110/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com