gpt4 book ai didi

ffmpeg - 当输入 pcm 样本计数不等于 1024 时,如何使用 ffmpeg-API 将重新采样的 PCM 音频编码为 AAC

转载 作者:行者123 更新时间:2023-12-02 08:49:12 29 4
gpt4 key购买 nike

我正在致力于捕获音频并将其流式传输到 RTMP 服务器。我在 MacOS 下工作(在 Xcode 中),因此为了捕获音频样本缓冲区,我使用 AVFoundation 框架。但对于编码和流媒体,我需要使用 ffmpeg-API 和 libfaac 编码器。因此输出格式必须是 AAC(以支持 iOS 设备上的流播放)。

我遇到了这样的问题:音频捕获设备(在我的例子中是罗技相机)为我提供了 512 个 LPCM 样本的样本缓冲区,我可以从 16000、24000、36000 或 48000 Hz 中选择输入采样率。当我将这 512 个样本提供给 AAC 编码器(配置为适当的采样率)时,我听到缓慢且抽搐的音频(似乎每帧后都是一片寂静)。

我发现(也许我错了),libfaac 编码器仅接受 1024 个样本的音频帧。当我在编码之前将输入采样率设置为 24000 并将输入样本缓冲区重新采样为 48000 时,我获得了 1024 个重新采样的样本。将这 1024 个样本编码为 AAC 后,我在输出中听到了正确的声音。但是,当输出采样率必须为 48000 Hz 时,我的网络摄像头会在任何输入采样率的缓冲区中生成 512 个样本。所以无论如何我都需要进行重采样,并且重采样后我不会在缓冲区中获得恰好1024个样本。

有没有办法在 ffmpeg-API 功能中解决这个问题

如果有任何帮助,我将不胜感激。

PS:我想我可以累积重新采样的缓冲区,直到样本数达到 1024,然后对其进行编码,但这是流,因此结果时间戳和其他输入设备会出现问题,并且这种解决方案不合适。

当前问题源于[问题]中描述的问题:How to fill audio AVFrame (ffmpeg) with the data obtained from CMSampleBufferRef (AVFoundation)?

这是带有音频编解码器配置的代码(还有视频流,但视频工作正常):

    /*global variables*/
static AVFrame *aframe;
static AVFrame *frame;
AVOutputFormat *fmt;
AVFormatContext *oc;
AVStream *audio_st, *video_st;
Init ()
{
AVCodec *audio_codec, *video_codec;
int ret;

avcodec_register_all();
av_register_all();
avformat_network_init();
avformat_alloc_output_context2(&oc, NULL, "flv", filename);
fmt = oc->oformat;
oc->oformat->video_codec = AV_CODEC_ID_H264;
oc->oformat->audio_codec = AV_CODEC_ID_AAC;
video_st = NULL;
audio_st = NULL;
if (fmt->video_codec != AV_CODEC_ID_NONE)
{ //… /*init video codec*/}
if (fmt->audio_codec != AV_CODEC_ID_NONE) {
audio_codec= avcodec_find_encoder(fmt->audio_codec);

if (!(audio_codec)) {
fprintf(stderr, "Could not find encoder for '%s'\n",
avcodec_get_name(fmt->audio_codec));
exit(1);
}
audio_st= avformat_new_stream(oc, audio_codec);
if (!audio_st) {
fprintf(stderr, "Could not allocate stream\n");
exit(1);
}
audio_st->id = oc->nb_streams-1;

//AAC:
audio_st->codec->sample_fmt = AV_SAMPLE_FMT_S16;
audio_st->codec->bit_rate = 32000;
audio_st->codec->sample_rate = 48000;
audio_st->codec->profile=FF_PROFILE_AAC_LOW;
audio_st->time_base = (AVRational){1, audio_st->codec->sample_rate };
audio_st->codec->channels = 1;
audio_st->codec->channel_layout = AV_CH_LAYOUT_MONO;


if (oc->oformat->flags & AVFMT_GLOBALHEADER)
audio_st->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
}

if (video_st)
{
// …
/*prepare video*/
}
if (audio_st)
{
aframe = avcodec_alloc_frame();
if (!aframe) {
fprintf(stderr, "Could not allocate audio frame\n");
exit(1);
}
AVCodecContext *c;
int ret;

c = audio_st->codec;


ret = avcodec_open2(c, audio_codec, 0);
if (ret < 0) {
fprintf(stderr, "Could not open audio codec: %s\n", av_err2str(ret));
exit(1);
}

//…
}

重新采样和编码音频:

if (mType == kCMMediaType_Audio)
{
CMSampleTimingInfo timing_info;
CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &timing_info);
double pts=0;
double dts=0;
AVCodecContext *c;
AVPacket pkt = { 0 }; // data and size must be 0;
int got_packet, ret;
av_init_packet(&pkt);
c = audio_st->codec;
CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer);

NSUInteger channelIndex = 0;

CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16));
size_t lengthAtOffset = 0;
size_t totalLength = 0;
SInt16 *samples = NULL;
CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples));

const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer));

SwrContext *swr = swr_alloc();

int in_smprt = (int)audioDescription->mSampleRate;
av_opt_set_int(swr, "in_channel_layout", AV_CH_LAYOUT_MONO, 0);

av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0);

av_opt_set_int(swr, "in_channel_count", audioDescription->mChannelsPerFrame, 0);
av_opt_set_int(swr, "out_channel_count", audio_st->codec->channels, 0);

av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout, 0);
av_opt_set_int(swr, "in_sample_rate", audioDescription->mSampleRate,0);

av_opt_set_int(swr, "out_sample_rate", audio_st->codec->sample_rate,0);

av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_S16, 0);

av_opt_set_sample_fmt(swr, "out_sample_fmt", audio_st->codec->sample_fmt, 0);

swr_init(swr);
uint8_t **input = NULL;
int src_linesize;
int in_samples = (int)numSamples;
ret = av_samples_alloc_array_and_samples(&input, &src_linesize, audioDescription->mChannelsPerFrame,
in_samples, AV_SAMPLE_FMT_S16P, 0);


*input=(uint8_t*)samples;
uint8_t *output=NULL;


int out_samples = av_rescale_rnd(swr_get_delay(swr, in_smprt) +in_samples, (int)audio_st->codec->sample_rate, in_smprt, AV_ROUND_UP);

av_samples_alloc(&output, NULL, audio_st->codec->channels, out_samples, audio_st->codec->sample_fmt, 0);
in_samples = (int)numSamples;
out_samples = swr_convert(swr, &output, out_samples, (const uint8_t **)input, in_samples);


aframe->nb_samples =(int) out_samples;


ret = avcodec_fill_audio_frame(aframe, audio_st->codec->channels, audio_st->codec->sample_fmt,
(uint8_t *)output,
(int) out_samples *
av_get_bytes_per_sample(audio_st->codec->sample_fmt) *
audio_st->codec->channels, 1);

aframe->channel_layout = audio_st->codec->channel_layout;
aframe->channels=audio_st->codec->channels;
aframe->sample_rate= audio_st->codec->sample_rate;

if (timing_info.presentationTimeStamp.timescale!=0)
pts=(double) timing_info.presentationTimeStamp.value/timing_info.presentationTimeStamp.timescale;

aframe->pts=pts*audio_st->time_base.den;
aframe->pts = av_rescale_q(aframe->pts, audio_st->time_base, audio_st->codec->time_base);

ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet);

if (ret < 0) {
fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret));
exit(1);
}
swr_free(&swr);
if (got_packet)
{
pkt.stream_index = audio_st->index;

pkt.pts = av_rescale_q(pkt.pts, audio_st->codec->time_base, audio_st->time_base);
pkt.dts = av_rescale_q(pkt.dts, audio_st->codec->time_base, audio_st->time_base);

// Write the compressed frame to the media file.
ret = av_interleaved_write_frame(oc, &pkt);
if (ret != 0) {
fprintf(stderr, "Error while writing audio frame: %s\n",
av_err2str(ret));
exit(1);
}

}

最佳答案

在遇到类似问题后我也来到这里。我正在从 Blackmagic Decklink SDI 卡读取 720p50 的音频和视频,这意味着每个视频帧 (48k/50fps) 有 960 个样本,我想与视频一起编码。当只向 aacenc 发送 960 个样本时,得到了非常奇怪的音频,而且它也没有真正提示这个事实。

开始使用 AVAudioFifo(参见 ffmpeg/doc/examples/transcode_aac.c)并不断向其添加帧,直到有足够的帧来满足 aacenc。我猜这意味着我的样本播放得太晚了,因为 pts 将设置为 1024 个样本,而前 960 个样本实际上应该有另一个值。但是,就我所听到/看到的而言,它并不是很明显。

关于ffmpeg - 当输入 pcm 样本计数不等于 1024 时,如何使用 ffmpeg-API 将重新采样的 PCM 音频编码为 AAC,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16904841/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com