gpt4 book ai didi

python - 当 'half' 除以 2 时,为什么 PyAudio 流的音量有时为 'converted bytes' 而有时会产生不需要的白噪声?

转载 作者:行者123 更新时间:2023-12-05 04:45:47 27 4
gpt4 key购买 nike

我是在 Python 3.7 中使用 PyAudio 模块的“新手”,并且已经使用 PyAudio 接口(interface)做了一些“巧妙”的事情。我已经弄清楚如何生成和播放一些按顺序排列在一起的“自定义音高”,其中数据首先转换为 -32768 到 +32768 范围(使用 int(n).to_bytes() 和然后 n = data.from_bytes()在字节和整数之间来回转换,更改值,然后再次为流转换回字节)。

虽然我的值是整数,但我可以将“自定义音调”的音量除以 2 以“减半”,但是,当我除以 2 时如果我使“n”(用于整数值的整数变量)等于“sounds\hello.wav”文件的转换数据,它不会“减半”音量,它创建取而代之的是不需要的白噪声。如果我除以 2,我的“sounds\hello.wav”文件可以正常播放。

我的评论是“全部大写”,这就是“问题”所在。 “大写注释部分”显示了四个不同的“选项”,可用于“n”的值,在 n 被转换为字节并写入流之前。这四个“选项”中的三个 有效,但我一直在努力弄清楚为什么“第四个选项”给我带来了“问题”。 “重现问题”的“四个选项”是为什么我的代码生成两个警告,不是“程序问题”。我正在从事的工作,有一天可能会帮助创造一种全新的声音和音乐技术。这是我的代码...

import math
import time
import wave

import pyaudio

pitches = 0
position = []
start = time.time()
started = True
oldTime = 0
delta = 0
run_time = 0
val = []
lastVal = []
lastVal2 = []
count = 0

def get_pitches():
global val
global run_time
global lastVal
global lastVal2
global position
global pitches

n = 0
val = []
pitches = 0

# Store the offset and the increment (through time) into result.
run_time += delta

# PITCHES GO HERE.
n += add_pitch_with_time_stamp(offset = 0.0, increment = 0.0, volume = 0.5, pitch_stamp=[0.01, 0.015, 0.02, 0.015], time_stamp=[.5, .5, .5, .5], transition_time_stamp = [10, 10, 10, 10], voice = "sounds\\ah.wav")
n += add_pitch_with_time_stamp(offset = 0.0, increment = 0.0, volume = 0.5, pitch_stamp=[0.01, 0.02, 0.03, 0.02], time_stamp=[1, 1, 1, 1], transition_time_stamp = [10, 10, 10, 10], voice = "sounds\\ah.wav")
#n += add_pitch(offset = .01, increment = .1, volume = 1)
#n += add_pitch(offset = 0.015, increment = -.001, volume = 1)
#n += add_pitch(offset = 0.04, increment = 0, volume = 1)

# Average out the pitches before returning n.
if pitches != 0:
n /= pitches

return n

def add_pitch(offset, increment, volume):
global pitches
global delta
global run_time
global val
global lastVal
global lastVal2
global position

# Match the size of arrays for positions and last recorded values.
if pitches >= len(position):
position.append(0)
if pitches >= len(lastVal):
lastVal.append(0)
if pitches >= len(lastVal2):
lastVal2.append(0)

# Get the calculated pitch for the wave.
pitch = ((run_time - start) * increment) + offset

# If the pitch is out of range set the result to 0.
if 0.3 > pitch >= 0:
if pitches < len(lastVal):
lastVal2[pitches] = lastVal[pitches]

val.append((1 + math.sin(((position[len(val) - 1]) * pitch) * math.pi * 2) * 0.5 * volume) - 0.5)

if pitches < len(lastVal):
lastVal[pitches] = val[len(val) - 1]
result = ((val[len(val) - 1] * 0x7f) + 0x80)
else:
result = 0

# Increase pitches per function call to determine the average value for n.
pitches += 1
else:
result = 0

return result

def add_pitch_with_time_stamp(offset, increment, volume, pitch_stamp, time_stamp=None, transition_time_stamp=None, voice=None):
global pitches
global delta
global run_time
global val
global lastVal
global lastVal2
global position

# Match size for time stamp.
for i in range(0, len(time_stamp)):
if (i + 1) > len(time_stamp):
time_stamp.append(1)

# Match size for transition time stamp.
for i in range(0, len(pitch_stamp)):
if (i + 1) > len(transition_time_stamp):
transition_time_stamp.append(1)

# Get a total time modulation from the time stamp.
time_modulation = 0
for i in range(0, len(time_stamp)):
time_modulation += time_stamp[i]

# Get the time index.
time_flow = (time.time() - start) % time_modulation

# Store the time transitions into f.
f = get_transition_value(pitch_stamp, transition_time_stamp, time_stamp, time_flow)

# Match the size of arrays for positions and last recorded values.
if pitches >= len(position):
position.append(0)
if pitches >= len(lastVal):
lastVal.append(0)
if pitches >= len(lastVal2):
lastVal2.append(0)

# Get the calculated pitch for the wave.
pitch = (((run_time - start) * increment) + (f + offset))

# If the pitch is out of range set the result to 0.
if 0.3 > pitch >= 0:
if pitches < len(lastVal):
lastVal2[pitches] = lastVal[pitches]

#print (data2)

if voice is None:
val.append((1 + math.sin(((position[len(val) - 1]) * pitch) * math.pi * 2) * 0.5 * volume) - 0.5)
else:
val.append((1 + math.sin(((position[len(val) - 1]) * pitch) * math.pi * 2) * 0.5 * volume) - 0.5)

if pitches < len(lastVal):
lastVal[pitches] = val[len(val) - 1]
result = ((val[len(val) - 1] * 0x7f) + 0x80)
else:
result = 0

# Increase pitches per function call to determine the average value for n.
pitches += 1
else:
result = 0

return result

def get_transition_value(value_list, transition_list, t_stamp, t_flow):
t_total = 0
t_position = t_flow
t_index = 0
for i in range(0, len(t_stamp)):
t_total += t_stamp[i]
if t_flow >= t_total:
t_position -= t_stamp[i]
t_index = i + 1

#t_process is the fraction of time between each transition.
t_process = t_position / t_stamp[t_index]

# Get the current value from the time stamp.
v_floor = value_list[t_index % len(value_list)]

# Get the next value from the time stamp.
v_ceil = value_list[(t_index + 1) % len(value_list)]

# Determine the 'power' between each transition
transform_power = transition_list[int(t_flow) % len(value_list)]

return transition(v_floor, v_ceil, math.pow(t_process % 1, transform_power))

def transition (down, up, mid):
# Another function for finding in between values.
return (down * (1 - mid)) + (up * mid)

def get_delta_time():
# Store the delta time into a delta variable.
global delta
global oldTime
delta = time.time()-oldTime
oldTime = time.time()

def do_pitches():
global pitches
global position
global started
global lastVal
global lastVal2
global count
global delta

# Create an interface to PortAudio
p = pyaudio.PyAudio()

wf = wave.open("sounds\\hello.wav", 'rb')

# Open a .Stream object to write the WAV file to
# 'output = True' indicates that the sound will be played rather than recorded
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()), # 8bit
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)

try:
while True:
# Make a variable called 'n', and set it to 'silent' (0).
pitches = 0

# Store pitches in n.
n: int = 0
if started:
position = []
lastVal = []
lastVal2 = []
#n = get_pitches()
for i in range(0, pitches):
position.append(0)
lastVal.append(0)
lastVal2.append(0)
started = False

# Read the voice data 1 frame at a time.
data2 = wf.readframes(1)

# Convert the data from byte format, into an integer value ranging from -32768 to 32768.
v = int.from_bytes(data2, 'big')

# NOTE:
#
# If I use this line only, without dividing the value of n by 2,
# it works fine.
#
# This line takes the values of all the pitches (averaged) placed
# in the get_pitches() function before later converting 'n' to a
# byte value (called 'data') and then writing 'data' to the stream.
#
n = (transition(n, get_pitches(), 1))

# NOTE:
#
# If I use this line only, without dividing the value of n by 2,
# it works fine.
#
# This line will play a .wav file called 'sound\hello.wav' before
# later converting n to a byte value (called data) and writing
# data to the stream.
#
n = v

# NOTE:
#
# If I use this line only, dividing the value of n by 2, it
# works fine.
#
# This line takes the values of all the pitches (averaged) in
# the get_pitches() function ... and will 'halve' the volume
# (as it is supposed to do since i 'half-ed' the value).
#
# The value of n later gets converted to a byte value (called
# 'data') and gets written to the stream.
#
n = (transition(n, get_pitches(), 1)) / 2

# NOTE:
#
# ***problem***: if I use this line only, and dividing the value
# of n by 2, this produces unwanted white noise instead of
# 'halving' the volume, even though i divided the value
# of n by 2.
#
# This line should play a .wav file called 'sound\hello.wav'
# before later converting n to a byte value (called data)
# and writing data to the stream.
#
n = int(v / 2)

# Convert the value of 'n' into bytes.
data = int(n).to_bytes(2, 'big')

#wf.setpos(int((time.time() - start) % wf.getsampwidth()))
#print (int(time.time() % wf.getsampwidth()))

# Writing data to stream makes the sound.
stream.write(data)

# Write voice to voice stream.
#stream2.write(data2)

# Increment position so that the 'n' result (from getPitches)
# produces a sine-wave.
for i in range (0, len(position)):
position[i] += 1

# Limit each position to 1000 chunks to prevent popping.
if count % 1000 == 0:
position[i] = 0

get_delta_time()
count += 1
except KeyboardInterrupt:
pass

# In the case the while loop breaks.
stream.close()
p.terminate()

do_pitches()

最佳答案

我发现了问题...我使用了一个名为“top.wav”的 .wav 文件,其中包含一个 wav 可以包含的最大可能正值的 1 秒,以及另一个名为“bottom.wav”的 .wav 文件,它有 1 秒的最大可能负值(查看这 2 个 .wav 文件生成的值,所以我可以完全理解“字节系统”是如何工作的)。

我没有将“数据”的字节值转换为整数(使用 int.from_bytes()),而是发现了一个名为 struct.unpack() 的函数,它获取字节数据,并以正确的方式将其转换为值为 ( , 0)

的元组

我从使用...中获得了真正的值(value)

解码[0]

使用代码时...

...
fmt = "<" + "h"
if data != b'' and data != b'\x00\x00\x00\x00':
decoded = struct.unpack(fmt, data)
if data == b'\x00\x00\x00\x00':
decoded = (0, )
...

然后,我注意到该值被“扰乱”到任何“范围”从 0 到 128 的地方,需要转换为 128 -(值 - 1),而范围从 129 到 256 的任何东西都需要转换为转换为 (256 - (value - 128)) - 1...所以我不得不编写一个名为“反转值”的函数...

...
#This function makes values 0 to 128, 128 to 0 and values 129 to 256, 256 to 129.
def invert_values(n):
if n < 128:
n = 128 - (n - 1)
if 128 <= n < 256:
n = (256 - (n - 128)) - 1
return n
...

之后做我的算术,

使用...

n = invert_values(n)

在将 n 转换回字节值之前,我的 wavs 播放正常。当我除以 2 时,我的体积“减半”。

关于python - 当 'half' 除以 2 时,为什么 PyAudio 流的音量有时为 'converted bytes' 而有时会产生不需要的白噪声?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69045547/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com