python - 如何将 mfcc 向量与注释中的标签结合起来传递给神经网络-6ren

python - 如何将 mfcc 向量与注释中的标签结合起来传递给神经网络

转载作者：太空宇宙更新时间：2023-11-03 14:01:58

我使用 librosa 为我的音频文件创建了 mfcc，如下所示:

import librosa
y, sr = librosa.load('myfile.wav')
print y
print sr
mfcc=librosa.feature.mfcc(y=y, sr=sr)

我还有一个文本文件，其中包含与音频对应的手动注释[开始、停止、标记]，如下所示:

0.0 2.0 sound1
2.0 4.0 sound2
4.0 6.0 silence
6.0 8.0 sound1

问题:如何将 librosa 生成的生成的 mfcc 与文本文件中的注释结合起来。

最终目标是，我要结合label对应的mfcc，通过
它到一个神经网络。
因此，神经网络将具有 mfcc 和相应的标签作为训练数据。

如果它是一维的，我可以有 N 列和 N 个值，最后的 Y 列有一个类标签。但我对如何进行感到困惑，因为 mfcc 的形状类似于(16, X) 或(20，是)。所以我不知道如何将两者结合起来。

我的示例 mfcc 在这里:https://gist.github.com/manbharae/0a53f8dfef6055feef1d8912044e1418

请帮忙谢谢。

更新:目标是训练一个神经网络，以便它在将来遇到新声音时能够识别它。

我用谷歌搜索，发现 mfcc 非常适合语音。但是我的音频有语音但我想识别非语音。对于通用音频分类/识别任务，是否还有其他推荐的音频功能？

最佳答案

尝试以下操作。解释包含在代码中。

import numpy
import librosa

# The following function returns a label index for a point in time (tp)
# this is psuedo code for you to complete
def getLabelIndexForTime(tp):
    # search the loaded annoations for what label corresponsons to the given time
    # convert the label to an index that represents its unqiue value in the set
    # ie.. 'sound1' = 0, 'sound2' = 1, ...
    #print tp  #for debug
    label_index = 0 #replace with logic above
    return label_index


if __name__ == '__main__':
    # Load the waveforms samples and convert to mfcc
    raw_samples, sample_rate = librosa.load('Front_Right.wav')
    mfcc  = librosa.feature.mfcc(y=raw_samples, sr=sample_rate)
    print 'Wave duration is %4.2f seconds' % (len(raw_samples)/float(sample_rate))

    # Create the network's input training data, X
    # mfcc is organized (feature, sample) but the net needs (sample, feature)
    # X is mfcc reorganized to (sample, feature)
    X     = numpy.moveaxis(mfcc, 1, 0)
    print 'mfcc.shape:', mfcc.shape
    print 'X.shape:   ', X.shape

    # Note that 512 samples is the default 'hop_length' used in calculating 
    # the mfcc so each mfcc spans 512/sample_rate seconds.
    mfcc_samples = mfcc.shape[1]
    mfcc_span    = 512/float(sample_rate)
    print 'MFCC calculated duration is %4.2f seconds' % (mfcc_span*mfcc_samples)

    # for 'n' network input samples, calculate the time point where they occur
    # and get the appropriate label index for them.
    # Use +0.5 to get the middle of the mfcc's point in time.
    Y = []
    for sample_num in xrange(mfcc_samples):
        time_point = (sample_num + 0.5) * mfcc_span
        label_index = getLabelIndexForTime(time_point)
        Y.append(label_index)
    Y = numpy.array(Y)

    # Y now contains the network's output training values
    # !Note for some nets you may need to convert this to one-hot format
    print 'Y.shape:   ', Y.shape
    assert Y.shape[0] == X.shape[0] # X and Y have the same number of samples

    # Train the net with something like...
    # model.fit(X, Y, ...   #ie.. for a Keras NN model

我应该提到这里的 Y 数据旨在用于具有 softmax 输出的网络，该输出可以使用整数标签数据进行训练。 Keras 模型通过 sparse_categorical_crossentropy 损失函数接受这一点(我相信损失函数在内部将其转换为单热编码)。其他框架要求 Y 训练标签以单热编码格式交付。这种比较常见。有很多关于如何进行转换的示例。对于你的情况，你可以做类似的事情......

Yoh = numpy.zeros(shape=(Y.shape[0], num_label_types), dtype='float32')
for i, val in enumerate(Y):
    Yoh[i, val] = 1.0

至于 mfcc 被接受用于对非语音进行分类，我希望它们能够工作，但您可能想尝试修改它们的参数，即 .. librosa 允许您执行类似 n_mfcc=40 的操作，所以您将获得 40 个特征，而不仅仅是 20 个。为了好玩，您可以尝试用相同大小(512 个样本)的简单 FFT 替换 mfcc，然后看看哪个效果最好。

关于python - 如何将 mfcc 向量与注释中的标签结合起来传递给神经网络，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48388641/

文章推荐： c# - 如何在运行时在 GridView 中添加动态文本框？

文章推荐： c# - 我应该如何将托管的 DirectX 程序迁移到 Windows 7？

文章推荐： c# - 在 C# .NET 中使用命名空间将简单的 JSON/XML 转换为 XML

文章推荐： python - 复选框以取消选中所有其他复选框

python - Vagrant 起来 : InsecurePlatformWarning
我正在尝试执行 vagrant up 但一直遇到此错误: ==> default: IOError: [Errno 13] Permission denied: '/usr/local/lib/pyt
html - 如何让不同高度的html div float 起来
我在容器 div 中有一系列动态创建的不同高度的 div。 Varying text... Varying text... Varying text... Varying text.
Vagrant 起来 : bad interpreter: No such file or directory
通过 cygwin 运行 vagrant up 时遇到以下错误。 stderr: /bin/bash: /home/vagrant/.ansible/tmp/ansible-tmp-14872260
有了这个开源项目，不会 Web 开发也能让数据“动”起来！
今天要向小伙伴们介绍的是一个能够快速地把数据制作成可视化、交互页面的 Python 框架：Streamlit，分分钟让你的数据动起来！犹记得我在做机器学习和数据分析方面的毕设时，
vagrant - vb.customize 'storageattach' 第一次挂载我的磁盘，但在 vagrant pause 后更改丢失； Vagrant 起来
我是 vagrant 的新手，正在尝试将第二个磁盘添加到我正在用 vagrant 制作的虚拟机中。我想出了如何在第一次启动虚拟机时连接磁盘，但是当我关闭机器时然后再次备份(使用 'vagrant

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何将 mfcc 向量与注释中的标签结合起来传递给神经网络