gpt4 book ai didi

Python音频信号分类MFCC特征神经网络

转载 作者:行者123 更新时间:2023-11-28 16:30:37 25 4
gpt4 key购买 nike

我正在尝试将音频信号从语音分类为情绪。为此,我正在提取音频信号的 MFCC 特征并将它们馈送到一个简单的神经网络(使用 PyBrain 的 BackpropTrainer 训练的 FeedForwardNetwork)。不幸的是,结果非常糟糕。从 5 个类别中,网络似乎几乎总是得出相同的类别。

我有 5 个情绪类别和大约 7000 个带标签的音频文件,我将它们划分为每个类别的 80% 用于训练网络,20% 用于测试网络。

想法是使用小窗口并从中提取 MFCC 特征以生成大量训练示例。在评估中,对一个文件中的所有窗口进行评估,多数票决定预测标签。

Training examples per class: 
{0: 81310, 1: 60809, 2: 58262, 3: 105907, 4: 73182}

Example of scaled MFCC features:
[ -6.03465056e-01 8.28665733e-01 -7.25728303e-01 2.88611116e-05
1.18677218e-02 -1.65316583e-01 5.67322809e-01 -4.92335095e-01
3.29816126e-01 -2.52946780e-01 -2.26147779e-01 5.27210979e-01
-7.36851560e-01]

Layers________________________: 13 20 5 (also tried 13 50 5 and 13 100 5)
Learning Rate_________________: 0.01 (also tried 0.1 and 0.3)
Training epochs_______________: 10 (error rate does not improve at all during training)

Truth table on test set:
[[ 0. 4. 0. 239. 99.]
[ 0. 41. 0. 157. 23.]
[ 0. 18. 0. 173. 18.]
[ 0. 12. 0. 299. 59.]
[ 0. 0. 0. 85. 132.]]

Success rate overall [%]: 34.7314201619
Success rate Class 0 [%]: 0.0
Success rate Class 1 [%]: 18.5520361991
Success rate Class 2 [%]: 0.0
Success rate Class 3 [%]: 80.8108108108
Success rate Class 4 [%]: 60.8294930876

好的,现在,正如您所看到的,结果在类中的分布非常糟糕。永远不会预测 0 级和 2 级。我认为,这暗示我的网络或更可能是我的数据存在问题。

我可以在这里发布很多代码,但我认为在下图中显示我为获得 MFCC 功能而采取的所有步骤更有意义。请注意,我使用没有开窗的整个信号只是为了说明。这看起来不错吗? MFCC 值非常大,它们不应该小得多吗? (在将它们送入网络之前,我将它们缩小,然后使用 minmaxscaler 对所有数据进行 [-2,2],还尝试了 [0,1])

Steps from signal to MFCC

这是我用于 Melfilter 库的代码,我在离散余弦变换之前直接应用它来提取 MFCC 特征(我从这里得到它:stackoverflow):

def freqToMel(freq):
'''
Calculate the Mel frequency for a given frequency
'''
return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
'''
Calculate the frequency for a given Mel frequency
'''
return 700 * (math.exp(freq / 1127.01048 - 1))

def melFilterBank(blockSize):
numBands = int(mfccFeatures)
maxMel = int(freqToMel(maxHz))
minMel = int(freqToMel(minHz))

# Create a matrix for triangular filters, one row per filter
filterMatrix = numpy.zeros((numBands, blockSize))

melRange = numpy.array(xrange(numBands + 2))

melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

# each array index represent the center of each triangular filter
aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
aux = 0.5 + 700 * blockSize * aux
aux = numpy.floor(aux) # Arredonda pra baixo
centerIndex = numpy.array(aux, int) # Get int values

for i in xrange(numBands):
start, centre, end = centerIndex[i:i + 3]
k1 = numpy.float32(centre - start)
k2 = numpy.float32(end - centre)
up = (numpy.array(xrange(start, centre)) - start) / k1
down = (end - numpy.array(xrange(centre, end))) / k2

filterMatrix[i][start:centre] = up
filterMatrix[i][centre:end] = down

return filterMatrix.transpose()

如何获得更好的预测结果?

最佳答案

这里我编了一个从语音识别性别的例子。我用了Hyke-dataset 1 对于这个例子。这只是一个快速制作的例子。如果有人想进行严肃的性别鉴定,可能会做得更好。但总的来说错误率会降低:

Build up data...
Train network...
Number of training patterns: 94956
Number of test patterns: 31651
Input and output dimensions: 13 2
Train network...
epoch: 0 train error: 62.24% test error: 61.84%
epoch: 1 train error: 34.11% test error: 34.25%
epoch: 2 train error: 31.11% test error: 31.20%
epoch: 3 train error: 30.34% test error: 30.22%
epoch: 4 train error: 30.76% test error: 30.75%
epoch: 5 train error: 30.65% test error: 30.72%
epoch: 6 train error: 30.81% test error: 30.79%
epoch: 7 train error: 29.38% test error: 29.45%
epoch: 8 train error: 31.92% test error: 31.92%
epoch: 9 train error: 29.14% test error: 29.23%

我使用了 scikits.talkbox 中的 MFCC 实现.也许下面的代码可以帮助你。 (性别识别肯定比情绪检测容易得多......也许你需要更多不同的特征。)

import glob

from scipy.io.wavfile import read as wavread
from scikits.talkbox.features import mfcc

from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer

def report_error(trainer, trndata, tstdata):
trnresult = percentError(trainer.testOnClassData(), trndata['class'])
tstresult = percentError(trainer.testOnClassData(dataset=tstdata), tstdata['class'])
print "epoch: %4d" % trainer.totalepochs, " train error: %5.2f%%" % trnresult, " test error: %5.2f%%" % tstresult

def main(auido_path, coeffs=13):
dataset = ClassificationDataSet(coeffs, 1, nb_classes=2, class_labels=['male', 'female'])
male_files = glob.glob("%s/male_audio/*/*_1.wav" % auido_path)
female_files = glob.glob("%s/female_audio/*/*_1.wav" % auido_path)

print "Build up data..."
for sex, files in enumerate([male_files, female_files]):
for f in files:
sr, signal = wavread(f)
ceps, mspec, spec = mfcc(signal, nwin=2048, nfft=2048, fs=sr, nceps=coeffs)
for i in range(ceps.shape[0]):
dataset.appendLinked(ceps[i], [sex])

tstdata, trndata = dataset.splitWithProportion(0.25)
trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()

print "Number of training patterns: ", len(trndata)
print "Number of test patterns: ", len(tstdata)
print "Input and output dimensions: ", trndata.indim, trndata.outdim

print "Train network..."
fnn = buildNetwork(coeffs, int(coeffs*1.5), 2, outclass=SoftmaxLayer, fast=True)
trainer = BackpropTrainer(fnn, dataset=trndata, learningrate=0.005)

report_error(trainer, trndata, tstdata)
for i in range(100):
trainer.trainEpochs(1)
report_error(trainer, trndata, tstdata)

if __name__ == '__main__':
main("/path/to/hyke/audio_data")


1 Azarias Reda、Saurabh Panjwani 和 Edward Cutrell: Hyke:适用于发展中地区的低成本远程考勤跟踪系统,第 5 届 ACM 发展中国家网络系统研讨会 (NSDR) ).

关于Python音频信号分类MFCC特征神经网络,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32304432/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com