gpt4 book ai didi

python - 如何在时域计算基音基频f(0))?

转载 作者:行者123 更新时间:2023-12-02 22:52:22 32 4
gpt4 key购买 nike

我是 DSP 新手,尝试计算音频文件每个分段帧的基频 (f(0))。 F0估计的方法可以分为三类:

  • 基于信号时域的时间动态;
  • 基于频率结构频域,以及
  • 混合方法。

大多数示例都是基于频率结构频域来估计基频,我正在寻找基于信号时域的时间动态的基础频率。

This article提供了一些信息,但我仍然不清楚如何在时域中计算它?

https://gist.github.com/endolith/255291

这是我找到的、迄今为止使用过的代码:

def freq_from_autocorr(sig, fs):
"""
Estimate frequency using autocorrelation
"""
# Calculate autocorrelation and throw away the negative lags
corr = correlate(sig, sig, mode='full')
corr = corr[len(corr)//2:]

# Find the first low point
d = diff(corr)
start = nonzero(d > 0)[0][0]

# Find the next peak after the low point (other than 0 lag). This bit is
# not reliable for long signals, due to the desired peak occurring between
# samples, and other peaks appearing higher.
# Should use a weighting function to de-emphasize the peaks at longer lags.
peak = argmax(corr[start:]) + start
px, py = parabolic(corr, peak)

return fs / px

如何在时域进行估计?

提前致谢!

最佳答案

这是一个正确的实现。不是很健壮,但肯定有效。为了验证这一点,我们可以生成一个已知频率的信号,看看我们将得到什么结果:

import numpy as np
from scipy.io import wavfile
from scipy.signal import correlate, fftconvolve
from scipy.interpolate import interp1d

fs = 44100
frequency = 440
length = 0.01 # in seconds

t = np.linspace(0, length, int(fs * length))
y = np.sin(frequency * 2 * np.pi * t)

def parabolic(f, x):
xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
return (xv, yv)

def freq_from_autocorr(sig, fs):
"""
Estimate frequency using autocorrelation
"""
corr = correlate(sig, sig, mode='full')
corr = corr[len(corr)//2:]
d = np.diff(corr)
start = np.nonzero(d > 0)[0][0]
peak = np.argmax(corr[start:]) + start
px, py = parabolic(corr, peak)

return fs / px

结果

运行 freq_from_autocorr(y, fs) 得到 ~442.014 Hz,大约有 0.45% 的误差。

奖励 - 我们可以改进

我们可以通过稍微多一点的编码来使其更加精确和健壮:

def indexes(y, thres=0.3, min_dist=1, thres_abs=False):
"""Peak detection routine borrowed from
https://bitbucket.org/lucashnegri/peakutils/src/master/peakutils/peak.py
"""
if isinstance(y, np.ndarray) and np.issubdtype(y.dtype, np.unsignedinteger):
raise ValueError("y must be signed")

if not thres_abs:
thres = thres * (np.max(y) - np.min(y)) + np.min(y)

min_dist = int(min_dist)

# compute first order difference
dy = np.diff(y)

# propagate left and right values successively to fill all plateau pixels (0-value)
zeros, = np.where(dy == 0)

# check if the signal is totally flat
if len(zeros) == len(y) - 1:
return np.array([])

if len(zeros):
# compute first order difference of zero indexes
zeros_diff = np.diff(zeros)
# check when zeros are not chained together
zeros_diff_not_one, = np.add(np.where(zeros_diff != 1), 1)
# make an array of the chained zero indexes
zero_plateaus = np.split(zeros, zeros_diff_not_one)

# fix if leftmost value in dy is zero
if zero_plateaus[0][0] == 0:
dy[zero_plateaus[0]] = dy[zero_plateaus[0][-1] + 1]
zero_plateaus.pop(0)

# fix if rightmost value of dy is zero
if len(zero_plateaus) and zero_plateaus[-1][-1] == len(dy) - 1:
dy[zero_plateaus[-1]] = dy[zero_plateaus[-1][0] - 1]
zero_plateaus.pop(-1)

# for each chain of zero indexes
for plateau in zero_plateaus:
median = np.median(plateau)
# set leftmost values to leftmost non zero values
dy[plateau[plateau < median]] = dy[plateau[0] - 1]
# set rightmost and middle values to rightmost non zero values
dy[plateau[plateau >= median]] = dy[plateau[-1] + 1]

# find the peaks by using the first order difference
peaks = np.where(
(np.hstack([dy, 0.0]) < 0.0)
& (np.hstack([0.0, dy]) > 0.0)
& (np.greater(y, thres))
)[0]

# handle multiple peaks, respecting the minimum distance
if peaks.size > 1 and min_dist > 1:
highest = peaks[np.argsort(y[peaks])][::-1]
rem = np.ones(y.size, dtype=bool)
rem[peaks] = False

for peak in highest:
if not rem[peak]:
sl = slice(max(0, peak - min_dist), peak + min_dist + 1)
rem[sl] = True
rem[peak] = False

peaks = np.arange(y.size)[~rem]

return peaks

def freq_from_autocorr_improved(signal, fs):
signal -= np.mean(signal) # Remove DC offset
corr = fftconvolve(signal, signal[::-1], mode='full')
corr = corr[len(corr)//2:]

# Find the first peak on the left
i_peak = indexes(corr, thres=0.8, min_dist=5)[0]
i_interp = parabolic(corr, i_peak)[0]

return fs / i_interp, corr, i_interp

运行freq_from_autocorr_improved(y, fs)会产生~441.825 Hz,大约有0.41%的误差。对于更复杂的情况,此方法的性能会更好,并且计算时间会延长两倍。

通过更长的采样时间(即将length设置为0.1s),我们将获得更准确的结果。

关于python - 如何在时域计算基音基频f(0))?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61534687/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com