gpt4 book ai didi

将编码的音频文件转换为具有信号值的文本

转载 作者:行者123 更新时间:2023-12-04 11:28:36 26 4
gpt4 key购买 nike

我是第一次使用音频文件使用 C 语言进行编程。我发现这段代码应该读取一个音频文件,然后编写一个包含多个信息的 csv 文件以分析音频波,以防万一是一个简单的声音:我对波幅、音色感兴趣声音及其高度和延伸。

           main () {   
// Create a 20 ms audio buffer (assuming Fs = 44.1 kHz)
int16_t buf[N] = {0}; // buffer
int n; // buffer index

// Open WAV file with FFmpeg and read raw samples via the pipe.
FILE *pipein;
pipein = popen("ffmpeg -i whistle.wav -f s16le -ac 1 -", "r");
fread(buf, 2, N, pipein);
pclose(pipein);

// Print the sample values in the buffer to a CSV file
FILE *csvfile;
csvfile = fopen("samples.csv", "w");
for (n=0 ; n<N ; ++n) fprintf(csvfile, "%d\n", buf[n]);
fclose(csvfile);

}

有人可以详细解释我如何读取音频文件以便从中提取我需要的信息吗?引用这段代码,有人能解释一下第 8 行管道的含义吗

pipein = popen("ffmpeg -i whistle.wav -f s16le -ac 1 -", "r");

附注我已经知道如何读取音频文件的标题,其中包含很多有用的信息,但我还想逐个样本地分析整个音频文件。

最佳答案

我刚刚编译然后运行了您的代码...输出文件 samples.csv 是一个垂直列,由带符号的 16 位整数组成,代表输入音频曲线的每个样本...如:YMMV

-20724
-19681
-18556
-17359
-16096
-14766
-13383
-11940
-10460
-8928
-7371
-5778
-4165
-2536
-897
749
2385
4019
5633
7224
8793
10318
11811
13251
14644
15977
17247

... 所以当原始音频在您的变量 buf 中时,您可以添加到上面的代码中来回答您的问题

volume - 音频是一条曲线,所以当曲线无法摆动时它会静音......在计算音量时理解位深度的含义至关重要......我建议你打开输出文件一个文本编辑器来观察每个值......知道你有 16 位的位深度告诉你可能的整数值的数量......在空白的凝视中read up on PCM raw audio ... 初步估计,对代码的以下更改将告诉您音量

int min_value = 9999;
int max_value = -9999;

for (n=0 ; n < N ; ++n) {

if (buf[n] < min_value) min_value = buf[n];
if (buf[n] > max_value) max_value = buf[n];

fprintf(csvfile, "%d\n", buf[n]);
}

fclose(csvfile);

printf("min_value %d\n", min_value);
printf("max_value %d\n", max_value);

知道你的音频的位深度,假设它是 16 位,那么你有 2^16 个可能的不同整数......从 0 到 (65536 - 1) 来表示你的原始音频的曲线......如果你的数据是无符号的......如果它的有符号整数(如WAV文件头中定义的那样)然后移动该范围使其以零为中心......那么范围将从-32768到(+32768 - 1)或 - 32768 到 +32767 ... 所以如果您的音频 buf[n] 值遍历从最小值到最大值的整个可能范围,那么您的音频样本片段可以说是全音量 ...现在我们可以解释上面的测量值:min_value 和 max_value ...如果 min_value 大约是 -16384,如果 max_value 大约是 +16384,那么音量大约是最大值的一半,因为它只消耗可能范围的一半整数值

因此可以使用此公式(通过过度简化)计算 0 到 1(最小到最大体积)范围内的体积

num_possible_ints = 2^bit_depth  // == 65536 for bit depth of 16 bits 
volume = 1 - ( num_possible_ints - ( max_value - min_value )) / num_possible_ints

为什么这么简单?因为如果不对您的音频缓冲区进行预处理 [通过丢弃很少达到最大值或最小值的异常音频样本,如果需要的话] 这种方法很容易给出过高的音量测量值

有更好的体积测量方法,但请记住它容易产生感知偏差... lookup Root Mean Square to calculate volume with better accuracy ... to quote :

RMS is averaging the area displaced by the signal, the area between the waveform and the linear zero line (not 0dB, but the axis).

As the waveform swings both above (+) and below (-) the centreline, the polarity of the swings has to be disregarded. Luckily, in maths, anything multiplied by itself (squaring) ends up positive. The signal can then be averaged (arithmetic mean over the timeline/window ED mentions or its integration time) as the positive and negative halves won't now cancel each other out -and finally the inverse to squaring is executed -square root.

RMS just means root-mean-square or the square-root of the arithmetic mean of the square of the signal.

In reality, what it means is that a signal of high-amplitude, spikey, transient content can have the same RMS value as one of lower amplitude but fatter waveform -because they both have the same energy content. If you put them through a speaker, they should both generate the same acoustical energy output.

Typical spikey waveforms are things like drum transients, whereas fatter waveforms would be sine waves or even square waves (as fat as you can get), where a much lower peak level would be needed to have the same power (a sine wave of 1.4Vp has the same RMS level as a square wave of 1.0Vp).

...这应该让你开始

附言 popen is doing a stream read from the input file

关于将编码的音频文件转换为具有信号值的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45576682/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com