gpt4 book ai didi

ios - 使用 AudioConverter 进行 AAC 编码并写入 AVAssetWriter

转载 作者:IT王子 更新时间:2023-10-29 05:43:23 29 4
gpt4 key购买 nike

我正在努力对从 AVCaptureSession 接收的音频缓冲区进行编码,使用AudioConverter,然后将它们附加到 AVAssetWriter

我没有收到任何错误(包括 OSStatus 响应),并且生成的 CMSampleBuffer 似乎包含有效数据,但是生成的文件根本没有任何可播放的音频。与视频一起写作时,视频帧停止在 (appendSampleBuffer() 中附加几帧返回 false,但没有 AVAssetWriter.error),可能是因为 Assets 作家正在等待音频 catch 来。我怀疑这与方式有关我正在为 AAC 设置启动。

该应用程序使用 RxSwift,但我删除了 RxSwift 部分以便更容易为更广泛的受众理解。

请查看下面代码中的评论以获取更多...评论

给定一个设置结构:

import Foundation
import AVFoundation
import CleanroomLogger

public struct AVSettings {

let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60

let audioBitrate: Int = 32 * 1024

/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)

let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]

}

一些辅助函数:

public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}

public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}

if result != noErr {
Log.error?.message("Could not create audio format description")
}

return description!
}

public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)

if result != noErr {
Log.error?.message("Could not create video format description")
}

return description!
}

这是 Assets 编写器的初始化方式:

guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }

guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }

let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)

let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)

let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)

writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true

let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")

let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)

if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }

if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }

assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)

这就是音频样本的编码方式,问题区域最有可能在这里。我重写了它,因此它不使用任何 Rx-isms。

var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)

var converter: AudioConverter?

// Indicates whether priming information has been attached to the first buffer
var primed = false

func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {

// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame

var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}

if result != noErr { throw RecordError.Unknown }

// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))

// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)

// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}

let converter = converter!

// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)

defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}

var result = noErr

var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))

// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)

withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}

if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}

// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}

if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)

// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}

return newBuffer

}

下面是为音频转换器获取样本的过程,以及数据传递给它的结构:

private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32

init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}

private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in

var result = noErr

if ioDataPacketCount.memory == 0 { return noErr }

let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory

// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}

var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)

if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}

let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount

ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData

if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}

return noErr
}

这就是我将 AudioBufferList 转换为 CMSampleBuffer 的方式:

public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {

var result = noErr

var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)

if result != noErr { return nil }

var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)

if result != noErr { return nil }
guard let b = newBuffer else { return nil }

CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer

}

有什么我明显做错的地方吗?有没有合适的方法从 AudioBufferList 构造 CMSampleBuffer?你如何转移启动从转换器到您创建的 CMSampleBuffer 的信息?

对于我的用例,我需要手动进行编码,因为缓冲区将是进一步操纵管道(虽然我已经禁用了所有编码后的转换以确保它有效。)

任何帮助都将非常感激。抱歉代码太多摘要,但我想提供尽可能多的上下文。

提前致谢:)


一些相关问题:

我用过的一些引用资料:

最佳答案

事实证明我做错了很多事情。我不会发布乱码,而是尝试将其组织成我发现的小块内容..


样本与数据包与帧

这对我来说是一个巨大的困惑:

  1. 每个 CMSampleBuffer 可以有 1 个或多个样本缓冲区(通过 CMSampleBufferGetNumSamples 发现)
  2. 每个包含 1 个样本的 CMSampleBuffer 代表一个音频数据包
  3. 因此,CMSampleBufferGetNumSamples(sample) 将返回给定缓冲区中包含的数据包数。
  4. 数据包包含。这由缓冲区的 AudioStreamBasicDescriptionmFramesPerPacket 属性控制。对于线性 PCM 缓冲区,每个样本缓冲区的总大小为 frames * bytes per frame。对于压缩缓冲区(如 AAC),总大小和帧数之间没有关系。

AudioConverterComplexInputDataProc

此回调用于检索更多线性 PCM 音频数据以进行编码。您必须必须提供至少 ioNumberDataPackets 指定的数据包数量。由于我一直在使用转换器进行实时推送式编码,因此我需要确保每次数据推送都包含最少数量的数据包。像这样的东西(伪代码):

let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)

切片CMSampleBuffer

如果 CMSampleBuffer 包含多个缓冲区,您实际上可以对它们进行切片。执行此操作的工具是 CMSampleBufferCopySampleBufferForRange。这很好,因此您可以向 AudioConverterComplexInputDataProc 提供它要求的确切 数据包数,这样可以更轻松地处理生成的编码缓冲区的计时信息。因为如果您在转换器期望 1024 时提供 1500 数据帧,则结果样本缓冲区的持续时间为 1024/sampleRate 而不是到 1500/sampleRate


启动和修剪持续时间

在进行 AAC 编码时,您必须像这样设置修剪持续时间:

CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)

我做错的一件事是我在编码时间 添加了修剪持续时间。这应该由您的编写器处理,以便它可以保证将信息添加到您的前导音频帧中。

此外,kCMSampleBufferAttachmentKey_TrimDurationAtStart 的值应该永远大于样本缓冲区的持续时间。启动示例:

  • 启动帧:2112
  • 采样率:44100
  • 启动持续时间:2112/44100 = ~0.0479s
  • 第一帧,帧数:1024,启动持续时间:1024/44100
  • 第二帧,帧数:1024,启动持续时间:1088/41100

创建新的 CMSampleBuffer

AudioConverterFillComplexBuffer 有一个可选的 outputPacketDescriptionsPtr你应该使用它。它将指向包含样本大小信息的新数据包描述数组。您需要此样本大小信息来构建新的压缩样本缓冲区:

let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?

CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)

关于ios - 使用 AudioConverter 进行 AAC 编码并写入 AVAssetWriter,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36351327/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com