AVFramework 中重复的音频帧通过 AVAsset 创建 *.mov 文件-6ren

AVFramework 中重复的音频帧通过 AVAsset 创建 *.mov 文件

转载作者：行者123 更新时间：2023-12-02 23:39:12

我在尝试使用 AVFramework 框架和 AVAsset 创建 ProRes 编码的 mov 文件时遇到了一些问题。

在 OSX 10.10.5 上，使用 XCode 7，链接到 10.9 库。
到目前为止，我已经设法创建了包含视频和多个音频 channel 的有效 ProRes 文件。

(我正在创建多个未压缩的 48K、16 位 PCM 音频轨道)

添加视频帧效果很好，添加音频帧效果很好，或者至少在代码中成功。

但是，当我回放文件时，似乎音频帧在 12、13、14 或 15 帧序列中重复。

查看波形，从 *.mov 很容易看到重复的音频...

也就是说，前 13 或 X 个视频帧都包含完全相同的音频，然后在下一个 X 中再次重复，然后一次又一次，等等......

视频很好，只是音频似乎在循环/重复。

无论我使用多少音频 channel /音轨作为源，问题都会出现，我仅使用 1 个音轨以及 4 个和 8 个音轨进行了测试。

它与我提供给系统的样本格式和数量无关，即使用 720p60、1080p23 和 1080i59 都表现出相同的错误行为。

实际上，720p 捕获似乎重复音频帧 30 或 31 次，而 1080 格式仅重复音频帧 12 或 13 次，

但我肯定将不同的音频数据提交到音频编码/SampleBuffer 创建过程，因为我已经非常详细地记录了这一点(但下面的代码中没有显示)

我尝试了许多不同的方法来修改代码并暴露问题，但没有成功，因此我在这里问，希望有人可以看到我的代码存在问题或给我一些关于这个问题的信息。

我正在使用的代码如下:

int main(int argc, const char * argv[])
{
    @autoreleasepool
    {
        NSLog(@"Hello, World!  - Welcome to the ProResCapture With Audio sample app. ");
        OSStatus status;
        AudioStreamBasicDescription audioFormat;
        CMAudioFormatDescriptionRef audioFormatDesc;

        // OK so lets include the hardware stuff first and then we can see about doing some actual capture  and compress stuff
        HARDWARE_HANDLE pHardware = sdiFactory();
        if (pHardware)
        {
            unsigned long ulUpdateType = UPD_FMT_FRAME;
            unsigned long ulFieldCount = 0;
            unsigned int numAudioChannels = 4; //8; //4;
            int numFramesToCapture = 300;

            gBFHancBuffer = (unsigned int*)myAlloc(gHANC_SIZE);

            int audioSize = 2002 * 4 * 16;
            short* pAudioSamples = (short*)new char[audioSize];
            std::vector<short*> vecOfNonInterleavedAudioSamplesPtrs;
            for (int i = 0; i < 16; i++)
            {
                vecOfNonInterleavedAudioSamplesPtrs.push_back((short*)myAlloc(2002 * sizeof(short)));
            }

            bool bVideoModeIsValid = SetupAndConfigureHardwareToCaptureIncomingVideo();

            if (bVideoModeIsValid)
            {

                gBFBytes = (BLUE_UINT32*)myAlloc(gGoldenSize);

                bool canAddVideoWriter = false;
                bool canAddAudioWriter = false;
                int nAudioSamplesWritten = 0;

                // declare the vars for our various AVAsset elements
                AVAssetWriter* assetWriter = nil;
                AVAssetWriterInput* assetWriterInputVideo = nil;
                AVAssetWriterInput* assetWriterAudioInput[16];


                AVAssetWriterInputPixelBufferAdaptor* adaptor = nil;
                NSURL* localOutputURL = nil;
                NSError* localError = nil;

                // create the file we are goijmng to be writing to
                localOutputURL = [NSURL URLWithString:@"file:///Volumes/Media/ProResAVCaptureAnyFormat.mov"];

                assetWriter = [[AVAssetWriter alloc] initWithURL: localOutputURL fileType:AVFileTypeQuickTimeMovie error:&localError];
                if (assetWriter)
                {
                    assetWriter.shouldOptimizeForNetworkUse = NO;

                    // Lets configure the Audio and Video settings for this writer...
                    {
                          // Video First.

                          // Add a video input
                          // create a dictionary with the settings we want ie. Prores capture and width and height.
                          NSMutableDictionary* videoSettings = [NSMutableDictionary dictionaryWithObjectsAndKeys:
                                                                AVVideoCodecAppleProRes422, AVVideoCodecKey,
                                                                [NSNumber numberWithInt:width], AVVideoWidthKey,
                                                                [NSNumber numberWithInt:height], AVVideoHeightKey,
                                                                nil];

                          assetWriterInputVideo = [AVAssetWriterInput assetWriterInputWithMediaType: AVMediaTypeVideo outputSettings:videoSettings];
                          adaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:assetWriterInputVideo
                                                                                                     sourcePixelBufferAttributes:nil];

                          canAddVideoWriter = [assetWriter canAddInput:assetWriterInputVideo];
                    }

                    { // Add a Audio AssetWriterInput

                          // Create a dictionary with the settings we want ie. Uncompressed PCM audio 16 bit little endian.
                          NSMutableDictionary* audioSettings = [NSMutableDictionary dictionaryWithObjectsAndKeys:
                                                                [NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
                                                                [NSNumber numberWithFloat:48000.0], AVSampleRateKey,
                                                                [NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
                                                                [NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
                                                                [NSNumber numberWithBool:NO], AVLinearPCMIsFloatKey,
                                                                [NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
                                                                [NSNumber numberWithUnsignedInteger:1], AVNumberOfChannelsKey,
                                                                nil];

                          // OR use... FillOutASBDForLPCM(AudioStreamBasicDescription& outASBD, Float64 inSampleRate, UInt32 inChannelsPerFrame, UInt32 inValidBitsPerChannel, UInt32 inTotalBitsPerChannel, bool inIsFloat, bool inIsBigEndian, bool inIsNonInterleaved = false)
                          UInt32 inValidBitsPerChannel = 16;
                          UInt32 inTotalBitsPerChannel = 16;
                          bool inIsFloat = false;
                          bool inIsBigEndian = false;
                          UInt32 inChannelsPerTrack = 1;
                          FillOutASBDForLPCM(audioFormat, 48000.00, inChannelsPerTrack, inValidBitsPerChannel, inTotalBitsPerChannel, inIsFloat, inIsBigEndian);

                          status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault,
                                                                  &audioFormat,
                                                                  0,
                                                                  NULL,
                                                                  0,
                                                                  NULL,
                                                                  NULL,
                                                                  &audioFormatDesc
                                                                  );

                          for (int t = 0; t < numAudioChannels; t++)
                          {
                              assetWriterAudioInput[t] = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioSettings];
                              canAddAudioWriter = [assetWriter canAddInput:assetWriterAudioInput[t] ];

                              if (canAddAudioWriter)
                              {
                                  assetWriterAudioInput[t].expectsMediaDataInRealTime = YES; //true;
                                  [assetWriter addInput:assetWriterAudioInput[t] ];
                              }
                          }


                          CMFormatDescriptionRef myFormatDesc = assetWriterAudioInput[0].sourceFormatHint;
                          NSString* medType = [assetWriterAudioInput[0] mediaType];
                    }

                    if(canAddVideoWriter)
                    {
                          // tell the asset writer to expect media in real time.
                          assetWriterInputVideo.expectsMediaDataInRealTime = YES; //true;

                          // add the Input(s)
                          [assetWriter addInput:assetWriterInputVideo];

                          // Start writing the frames..
                          BOOL success = true;
                          success = [assetWriter startWriting];
                          CMTime startTime = CMTimeMake(0, fpsRate);
                          [assetWriter startSessionAtSourceTime:kCMTimeZero];
                          // [assetWriter startSessionAtSourceTime:startTime];

                      if (success)
                      {
                          startOurVideoCaptureProcess();

                          // **** possible enhancement is to use a pixelBufferPool to manage multiple buffers at once...
                          CVPixelBufferRef buffer = NULL;
                          int kRecordingFPS = fpsRate;
                          bool frameAdded = false;
                          unsigned int bufferID;


                          for( int i = 0; i < numFramesToCapture; i++)
                          {
                              printf("\n");

                              buffer = pixelBufferFromCard(bufferID, width, height, memFmt); // This function to get a CVBufferREf From our device, as well as getting the Audio data
                              while(!adaptor.assetWriterInput.readyForMoreMediaData)
                              {
                                    printf(" readyForMoreMediaData FAILED \n");
                              }

                              if (buffer)
                              {
                                  // Add video
                                  printf("appending Frame %d ", i);
                                  CMTime frameTime = CMTimeMake(i, kRecordingFPS);
                                  frameAdded = [adaptor appendPixelBuffer:buffer withPresentationTime:frameTime];
                                  if (frameAdded)
                                      printf("VideoAdded.....\n ");

                                  // Add Audio
                                  {
                                      // Do some Processing on the captured data to extract the interleaved Audio Samples for each channel
                                      struct hanc_decode_struct decode;
                                      DecodeHancFrameEx(gBFHancBuffer, decode);
                                      int nAudioSamplesCaptured = 0;
                                      if(decode.no_audio_samples > 0)
                                      {
                                          printf("completed deCodeHancEX, found %d samples \n", ( decode.no_audio_samples  / numAudioChannels) );
                                          nAudioSamplesCaptured = decode.no_audio_samples  / numAudioChannels;
                                      }

                                      CMTime audioTimeStamp = CMTimeMake(nAudioSamplesWritten, 480000); // (Samples Written) / sampleRate for audio


                                      // This function repacks the Audio from interleaved PCM data a vector of individual array of Audio data
                                      RepackDecodedHancAudio((void*)pAudioSamples, numAudioChannels, nAudioSamplesCaptured, vecOfNonInterleavedAudioSamplesPtrs);

                                      for (int t = 0; t < numAudioChannels; t++)
                                      {
                                          CMBlockBufferRef blockBuf = NULL; // ***********  MUST release these AFTER adding the samples to the assetWriter...
                                          CMSampleBufferRef cmBuf = NULL;

                                          int sizeOfSamplesInBytes = nAudioSamplesCaptured * 2;  // always 16bit memory samples...

                                          // Create sample Block buffer for adding to the audio input.
                                          status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
                                                                                      (void*)vecOfNonInterleavedAudioSamplesPtrs[t],
                                                                                      sizeOfSamplesInBytes,
                                                                                      kCFAllocatorNull,
                                                                                      NULL,
                                                                                      0,
                                                                                      sizeOfSamplesInBytes,
                                                                                      0,
                                                                                      &blockBuf);

                                          if (status != noErr)
                                                NSLog(@"CMBlockBufferCreateWithMemoryBlock error");

                                          status = CMAudioSampleBufferCreateWithPacketDescriptions(kCFAllocatorDefault,
                                                                                                   blockBuf,
                                                                                                   TRUE,
                                                                                                   0,
                                                                                                   NULL,
                                                                                                   audioFormatDesc,
                                                                                                   nAudioSamplesCaptured,
                                                                                                   audioTimeStamp,
                                                                                                   NULL,
                                                                                                   &cmBuf);
                                          if (status != noErr)
                                                NSLog(@"CMSampleBufferCreate error");

                                          // leys check if the CMSampleBuf is valid
                                          bool bValid = CMSampleBufferIsValid(cmBuf);

                                          // examine this values for debugging info....
                                          CMTime cmTimeSampleDuration = CMSampleBufferGetDuration(cmBuf);
                                          CMTime cmTimePresentationTime = CMSampleBufferGetPresentationTimeStamp(cmBuf);

                                          if (status != noErr)
                                              NSLog(@"Invalid Buffer found!!! possible CMSampleBufferCreate error?");


                                          if(!assetWriterAudioInput[t].readyForMoreMediaData)
                                              printf(" readyForMoreMediaData FAILED  - Had to Drop a frame\n");
                                          else
                                          {
                                              if(assetWriter.status == AVAssetWriterStatusWriting)
                                              {
                                                  BOOL r = YES;
                                                  r = [assetWriterAudioInput[t] appendSampleBuffer:cmBuf];
                                                  if (!r)
                                                  {
                                                      NSLog(@"appendSampleBuffer error");
                                                  }
                                                  else
                                                      success = true;

                                              }
                                              else
                                                  printf("AssetWriter Not ready???!? \n");
                                        }

                              if (cmBuf)
                              {
                                  CFRelease(cmBuf);
                                  cmBuf = 0;
                              }
                              if(blockBuf)
                              {
                                  CFRelease(blockBuf);
                                  blockBuf = 0;
                              }
                          }
                          nAudioSamplesWritten = nAudioSamplesWritten + nAudioSamplesCaptured;
                      }

                      if(success)
                      {
                          printf("Audio tracks Added..");
                      }
                      else
                      {
                          NSError* nsERR = [assetWriter error];
                          printf("Problem Adding Audio tracks / samples");
                      }
                      printf("Success \n");
                }


              if (buffer)
              {
                  CVBufferRelease(buffer);
              }
          }
      }
      AVAssetWriterStatus sta = [assetWriter status];
      CMTime endTime = CMTimeMake((numFramesToCapture-1), fpsRate);

      if (audioFormatDesc)
      {
          CFRelease(audioFormatDesc);
          audioFormatDesc = 0;
      }

      // Finish the session
      StopVideoCaptureProcess();
      [assetWriterInputVideo markAsFinished];
      for (int t = 0; t < numAudioChannels; t++)
      {
          [assetWriterAudioInput[t] markAsFinished];
      }

      [assetWriter endSessionAtSourceTime:endTime];


      bool finishedSuccessfully = [assetWriter finishWriting];
      if (finishedSuccessfully)
          NSLog(@"Writing file ended successfully \n");
      else
      {
          NSLog(@"Writing file ended WITH ERRORS...");
          sta = [assetWriter status];
          if (sta != AVAssetWriterStatusCompleted)
          {
              NSError* nsERR = [assetWriter error];
              printf("investoigating the error \n");
          }
      }
                    }
                    else
                    {
      NSLog(@"Unable to Add the InputVideo Asset Writer to the AssetWriter, file will not be written - Exiting");
                    }

                    if (audioFormatDesc)
      CFRelease(audioFormatDesc);
                }


                for (int i = 0; i < 16; i++)
                {
                    if (vecOfNonInterleavedAudioSamplesPtrs[i])
                    {
      bfFree(2002 * sizeof(unsigned short), vecOfNonInterleavedAudioSamplesPtrs[i]);
      vecOfNonInterleavedAudioSamplesPtrs[i] = nullptr;
                    }
                }

            }
            else
            {
                NSLog(@"Unable to find a valid input signal - Exiting");
            }


            if (pAudioSamples)
                delete pAudioSamples;
        }
    }
    return 0;
}

这是一个连接到一些特殊硬件的非常基本的示例(省略了相关代码)

它抓取视频和音频的帧，然后处理音频从交错的 PCM 到每个轨道的单个 PCM 数据阵列

然后将每个缓冲区添加到适当的轨道，无论是视频还是音频......

最后，AvAsset 的东西完成并关闭，我退出并清理。

任何帮助将不胜感激，

干杯，

詹姆士

最佳答案

好吧，我终于找到了解决此问题的有效解决方案。

解决方案分为两部分:

我不再使用 CMAudioSampleBufferCreateWithPacketDescriptions
使用 CMSampleBufferCreate(..) 和该函数调用的适当参数。

最初在使用 CMSampleBufferCreate 进行实验时，我误用了一些参数，它给了我与我最初在这里概述的相同的结果，但仔细检查了我为 CMSampleTimingInfo 结构传递的值 - 特别是持续时间部分，我最终一切正常!

所以看起来我正在正确地创建 CMBlockBufferRef，但是在使用它来创建我传递给 AVAssetWriterInput 的 CMSampleBufRef 时我需要更加小心!

希望这对其他人有所帮助，因为这对我来说是一个令人讨厌的问题!

詹姆斯

关于AVFramework 中重复的音频帧通过 AVAsset 创建 *.mov 文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41114549/

文章推荐： vb.net - 如何确定扬声器是否插入或拔出

文章推荐： powershell - ConvertFrom-Json剂量物反序列化为对象

文章推荐： elasticsearch - 使用Elasticsearch搜索多个不完整的单词

linux - Linux : Why does mov [bx], ax 上的 Masm32 工作，但 mov [ax]、bx(或 mov [bl]、al)不工作？
这个问题在这里已经有了答案: Differences between general purpose registers in 8086: [bx] works, [cx] doesn't? (3
mov - 无需ALU的协助，是否可以执行不需要添加任何偏移/位移的“mov”指令？
我最近开始探索计算机体系结构领域。在研究指令集体系结构时，我遇到了“ mov”指令，该指令将数据从一个位置复制到另一个位置。我知道某些类型的mov'指令是有条件的，而有些则需要添加偏移量或位移来查找特
assembly - MOV 与 MOV.B 汇编语言指令
我正在研究使用模拟 MSP430 CPU 的 Microcorruption CTF。我见过几个 mov 指令示例，例如: mov sp, r4 ;将堆栈指针的值移至寄存器4 mov #0xfffc
c - MOV 和 MOV ptr 的区别
我不明白 MOV 和 MOV ptr 之间的区别。例如，在这段 C 代码中: unsigned char x, y; x = 2; 汇编中的第二行是: `MOV x, 2` 但是这个 C 代码的第二
assembly - MOV src，dest(或)MOV dest，src？
MOV可能是每个人在学习ASM时都会学到的第一条指令。刚才我遇到了一本书Assembly Language Programming in GNU/Linux for IA32 Architectur
assembly - mov ax, bx 与 mov ax, [bx]
下面两行有什么区别？ mov ax, bx mov ax, [bx] 如果bx包含值100h，并且内存地址100h处的值是23，那么第二个是否将23复制到ax？另外，下面两行有什么区别？ mov a
c - 指令 mov %edi 和 mov %rsi 有什么作用？
我编写了一个基本的 C 程序，它定义了一个整型变量 x，将其设置为零并返回该变量的值: #include int main(int argc, char **argv) { int x;
assembly - "mov eax, [num]"和 "mov eax, num"之间的区别
我是一个初学者，正在编写汇编程序以使用以下代码打印从 1 到 9 的数字: section .text global _start _start:
assembly - "mov (%rax),%eax"和 "mov %rax,%eax"有什么区别？
mov (%rax),%eax有什么区别和 mov %rax,%eax ?我确定这是一个简单的问题，但我在任何地方都找不到答案。这是提示我的问题的原始代码: mov -0x8(%rbp),%r
assembly - MOV AX,CS 和 MOV DS,AX 的概念
有人可以解释一下这三个指令的功能吗？ ORG 1000H MOV AX,CS MOV DS,AX 我知道理论上的代码、数据和额外段是什么，但是: 在这个程序中它们是如何实现的？为什么整个
assembly - mov bx,ax 和 mov bh,ah 之间的速度有区别吗？
在 8086 架构的 16 位 MS-DOS 应用程序中，mov bx,ax 和 mov bh,ah 之间的速度有区别吗？最佳答案您没有指定架构，但至少在 8086 中指定, 286 , 386和
assembly - x86汇编代码中 "mov eax, cr3; mov cr3, eax"的作用是什么？
我正在反汇编一些代码，我发现: mov eax, cr3 mov cr3, eax 这些线的作用是什么？这是 x86 低级(BIOS/固件/引导加载程序之前)初始化代码。我们甚至还没有设置缓存。最
assembly - 可以始终使用 "mov eax, 0x1"代替 "mov rax, 0x1"吗？
使用 nasm 组装此代码时: BITS 64 mov eax, 0x1 mov rax, 0x1 我得到这个输出: b8 01 00 00 00 b8 01 00 00 00 这是 mov eax,
assembly - 为什么 mov eax,val 不是 mov $val,%eax 的英特尔语法等效项？
我试图理解 Intel 语法和 AT&T 语法之间的差异(我使用 GNU as)。我有两个文件，intel.s: .intel_syntax noprefix val: mov eax, v
ffmpeg - 更改 MOV 或 WAV 的 wav、aiff 或 mov 音频采样率而不更改样本数
我需要一种非常精确的方法来加速音频。我正在为 OpenDCP(一种用于制作数字电影包的开源工具)准备电影，以便在影院放映。我的源文件通常是 23.976fps 和 48.000kHz 音频的 qu
assembly - MOV r/m8,r8 和 MOV r8,r/m8 的区别
通过查看英特尔指令卷，我发现了这一点: 1) 88/r MOV r/m8,r8 2) 8A/r MOV r8,r/m8 当我在 NASM 中写下这样的一行，并使用列表选项将其组装时: mov al
assembly - 你如何区分 "MOV r/m64, imm32"和 MASM64 中的 "MOV r/m32, imm32"？
Intel 手册说 mov 有两种变体，涉及内存和 32 位立即操作数: MOV r/m32, imm32 MOV r/m64, imm32 第一个复制四个字节，第二个复制八个字节，采用给定的 32
ffmpeg 将 6 个单声道 wav 和 prores mov 混合到新的 6 channel mov - 如何？
我已经处理了一天了，最后不得不出来问。我想获取一个无声的 prores mov 文件(但显然确实有时间码轨道)并将其与 6 个单声道 wav 文件无损混合，使 6 个单声道 wav 在最终 mov 中
linux - 对 X86_64 linux : Why should we write mov [digit], al 的程序集中标签的使用感到困惑，但不是 mov digit, al？
这是我的代码: section .data digit db 0,10 section .text global _start _start: call _printRAXD
assembly - x86-64 做地址计算 mov 即 mov i(r, r, i), r 在端口 1 上执行？还是还是p0156？
我在问 mov需要计算该地址的指令，即(在 at&t 语法中mov i(r, r, i), reg或 mov reg, i(r, reg, i)必须在端口 1 上执行，因为它们实际上是带有 3 个操作

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

AVFramework 中重复的音频帧通过 AVAsset 创建 *.mov 文件