gpt4 book ai didi

swift - 什么会导致对MetalKit MTKView的draw()函数的定期调用出现滞后

转载 作者:搜寻专家 更新时间:2023-11-01 07:06:43 30 4
gpt4 key购买 nike

我正在设计一个cocoa应用程序,使用swift 4.0metalkit api for macos 10.13。我在这里报告的一切都是在我的2015年MBPro上完成的。
我已经成功地实现了一个mtkview,它可以很好地渲染简单的几何体和低顶点数(立方体、三角形等)。我实现了一个基于鼠标拖动的相机,它可以旋转、倾斜和放大。这是我旋转立方体时xcode fps调试屏幕的截图:
enter image description here
但是,当我尝试加载一个仅包含1500个顶点(每个顶点都存储为7 x 32位浮点)的数据集时…ie:42kb),我开始在fps上有一个非常糟糕的延迟。我将在下面展示代码实现。这是一个屏幕截图(请注意,在此图像上,视图仅包含几个顶点,这些顶点被渲染为大点):
enter image description here
以下是我的实现:
1)viewdidload():

override func viewDidLoad() {

super.viewDidLoad()

// Initialization of the projection matrix and camera
self.projectionMatrix = float4x4.makePerspectiveViewAngle(float4x4.degrees(toRad: 85.0),
aspectRatio: Float(self.view.bounds.size.width / self.view.bounds.size.height),
nearZ: 0.01, farZ: 100.0)
self.vCam = ViewCamera()

// Initialization of the MTLDevice
metalView.device = MTLCreateSystemDefaultDevice()
device = metalView.device
metalView.colorPixelFormat = .bgra8Unorm

// Initialization of the shader library
let defaultLibrary = device.makeDefaultLibrary()!
let fragmentProgram = defaultLibrary.makeFunction(name: "basic_fragment")
let vertexProgram = defaultLibrary.makeFunction(name: "basic_vertex")

// Initialization of the MTLRenderPipelineState
let pipelineStateDescriptor = MTLRenderPipelineDescriptor()
pipelineStateDescriptor.vertexFunction = vertexProgram
pipelineStateDescriptor.fragmentFunction = fragmentProgram
pipelineStateDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineStateDescriptor)

// Initialization of the MTLCommandQueue
commandQueue = device.makeCommandQueue()

// Initialization of Delegates and BufferProvider for View and Projection matrix MTLBuffer
self.metalView.delegate = self
self.metalView.eventDelegate = self
self.bufferProvider = BufferProvider(device: device, inflightBuffersCount: 3, sizeOfUniformsBuffer: MemoryLayout<Float>.size * float4x4.numberOfElements() * 2)
}

2)加载立方体顶点的mtlbuffer:
private func makeCubeVertexBuffer() {

let cube = Cube()
let vertices = cube.verticesArray
var vertexData = Array<Float>()
for vertex in vertices{
vertexData += vertex.floatBuffer()
}
VDataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])
self.vertexBuffer = device.makeBuffer(bytes: vertexData, length: VDataSize!, options: [])!
self.vertexCount = vertices.count
}

3)加载数据集顶点的mtlbuffer。注意,我显式地将这个缓冲区的存储模式声明为私有,以确保gpu对数据的有效访问,因为cpu在加载缓冲区后不需要访问数据。另外,请注意,我只加载了实际数据集中的1/100个顶点,因为当我尝试完全加载时,我机器上的整个操作系统开始滞后(只有4.2 MB的数据)。
public func loadDataset(datasetVolume: DatasetVolume) {

// Load dataset vertices
self.datasetVolume = datasetVolume
self.datasetVertexCount = self.datasetVolume!.vertexCount/100
let rgbaVertices = self.datasetVolume!.rgbaPixelVolume[0...(self.datasetVertexCount!-1)]
var vertexData = Array<Float>()
for vertex in rgbaVertices{
vertexData += vertex.floatBuffer()
}
let dataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])

// Make two MTLBuffer's: One with Shared storage mode in which data is initially loaded, and a second one with Private storage mode
self.datasetVertexBuffer = device.makeBuffer(bytes: vertexData, length: dataSize, options: MTLResourceOptions.storageModeShared)
self.datasetVertexBufferGPU = device.makeBuffer(length: dataSize, options: MTLResourceOptions.storageModePrivate)

// Create a MTLCommandBuffer and blit the vertex data from the Shared MTLBuffer to the Private MTLBuffer
let commandBuffer = self.commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer!.makeBlitCommandEncoder()
blitEncoder!.copy(from: self.datasetVertexBuffer!, sourceOffset: 0, to: self.datasetVertexBufferGPU!, destinationOffset: 0, size: dataSize)
blitEncoder!.endEncoding()
commandBuffer!.commit()

// Clean up
self.datasetLoaded = true
self.datasetVertexBuffer = nil
}

4)最后,这里是渲染循环。同样,这是在使用metalkit。
func draw(in view: MTKView) {
render(view.currentDrawable)
}

private func render(_ drawable: CAMetalDrawable?) {
guard let drawable = drawable else { return }

// Make sure an MTLBuffer for the View and Projection matrices is available
_ = self.bufferProvider?.availableResourcesSemaphore.wait(timeout: DispatchTime.distantFuture)

// Initialize common RenderPassDescriptor
let renderPassDescriptor = MTLRenderPassDescriptor()
renderPassDescriptor.colorAttachments[0].texture = drawable.texture
renderPassDescriptor.colorAttachments[0].loadAction = .clear
renderPassDescriptor.colorAttachments[0].clearColor = Colors.White
renderPassDescriptor.colorAttachments[0].storeAction = .store

// Initialize a CommandBuffer and add a CompletedHandler to release an MTLBuffer from the BufferProvider once the GPU is done processing this command
let commandBuffer = self.commandQueue.makeCommandBuffer()
commandBuffer?.addCompletedHandler { (_) in
self.bufferProvider?.availableResourcesSemaphore.signal()
}

// Update the View matrix and obtain an MTLBuffer for it and the projection matrix
let camViewMatrix = self.vCam.getLookAtMatrix()
let uniformBuffer = bufferProvider?.nextUniformsBuffer(projectionMatrix: projectionMatrix, camViewMatrix: camViewMatrix)

// Initialize a MTLParallelRenderCommandEncoder
let parallelEncoder = commandBuffer?.makeParallelRenderCommandEncoder(descriptor: renderPassDescriptor)

// Create a CommandEncoder for the cube vertices if its data is loaded
if self.cubeLoaded == true {
let cubeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
cubeRenderEncoder!.setCullMode(MTLCullMode.front)
cubeRenderEncoder!.setRenderPipelineState(pipelineState)
cubeRenderEncoder!.setTriangleFillMode(MTLTriangleFillMode.fill)
cubeRenderEncoder!.setVertexBuffer(self.cubeVertexBuffer, offset: 0, index: 0)
cubeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)
cubeRenderEncoder!.endEncoding()
}

// Create a CommandEncoder for the dataset vertices if its data is loaded
if self.datasetLoaded == true {
let rgbaVolumeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
rgbaVolumeRenderEncoder!.setRenderPipelineState(pipelineState)
rgbaVolumeRenderEncoder!.setVertexBuffer( self.datasetVertexBufferGPU!, offset: 0, index: 0)
rgbaVolumeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)
rgbaVolumeRenderEncoder!.endEncoding()
}

// End CommandBuffer encoding and commit task
parallelEncoder!.endEncoding()
commandBuffer!.present(drawable)
commandBuffer!.commit()
}

好吧,我已经完成了这些步骤,试图找出造成延迟的原因,同时要记住,延迟效果与数据集顶点缓冲区的大小成正比:
最初,我认为这是由于GPU无法足够快地访问内存,因为它处于共享存储模式->我将数据集mtlbuffer更改为私有存储模式。这并没有解决问题。
然后我认为这个问题是由于CPU在render()函数中花费了太多时间造成的。这可能是因为bufferprovider有问题,或者可能是因为cpu试图在每一帧重新处理/重新加载数据集顶点缓冲区->为了检查这个问题,我在xcode的仪器中使用了时间分析器。不幸的是,问题似乎是应用程序很少调用这个呈现方法(换句话说,mtkview的draw()方法)。以下是一些截图:
enter image description here
约10秒的峰值是立方体加载时
大约25-35秒之间的峰值是在加载数据集时
enter image description here
此图像(^)显示加载多维数据集后~10-20秒之间的活动。此时fps约为60。您可以看到,在这10秒内,主线程在render()函数中花费了大约53ms。
enter image description here
此图像(^)显示数据集加载后约40-50秒之间的活动。当fps小于10时。您可以看到,在这10秒内,主线程在render()函数中花费了大约4毫秒。如您所见,通常从这个函数中调用的方法都没有被调用(即:我们看到的那些只在加载多维数据集时调用的方法,上一个图像)。值得注意的是,当我加载数据集时,时间分析器的计时器开始跳跃(即:它停止几秒钟,然后跳到当前时间……重复)。
所以我就在这里。问题似乎是CPU不知何故被这42KB的数据超载了…递归地。我还用xcode工具中的分配器做了一个测试。据我所知,没有任何内存泄漏的迹象(你可能已经注意到很多这对我来说是新的)。
对不起,这篇文章太复杂了,我希望不会太难理解。提前感谢大家的帮助。
编辑:
这是我的阴影,如果你想看的话:
struct VertexIn{
packed_float3 position;
packed_float4 color;
};

struct VertexOut{
float4 position [[position]];
float4 color;
float size [[point_size]];
};

struct Uniforms{
float4x4 cameraMatrix;
float4x4 projectionMatrix;
};


vertex VertexOut basic_vertex(const device VertexIn* vertex_array [[ buffer(0) ]],
constant Uniforms& uniforms [[ buffer(1) ]],
unsigned int vid [[ vertex_id ]]) {

float4x4 cam_Matrix = uniforms.cameraMatrix;
float4x4 proj_Matrix = uniforms.projectionMatrix;

VertexIn VertexIn = vertex_array[vid];

VertexOut VertexOut;
VertexOut.position = proj_Matrix * cam_Matrix * float4(VertexIn.position,1);
VertexOut.color = VertexIn.color;
VertexOut.size = 15;

return VertexOut;
}

fragment half4 basic_fragment(VertexOut interpolated [[stage_in]]) {
return half4(interpolated.color[0], interpolated.color[1], interpolated.color[2], interpolated.color[3]);
}

最佳答案

我认为主要的问题是,你告诉金属做实例绘制时,你不应该。这一行:

rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)

告诉metal绘制每个顶点的实例。gpu的工作是随着顶点数的平方而增长的。此外,由于您没有使用实例ID来调整顶点位置,因此所有这些实例都是相同的,因此是多余的。
我想这句话也一样:
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)

尽管还不清楚 datasetVertexCount!是什么以及它是否随着 datasetVertexCount!而增长。无论如何,因为看起来您使用的是相同的管道状态,因此相同的着色器没有使用实例id,所以仍然是无用和浪费的。
其他事项:
当您实际上没有使用它启用的并行性时,为什么要使用 self.cubeVertexCount!?别那么做。
无论您在哪里使用 vertexCount方法,您几乎肯定应该使用 MTLParallelRenderCommandEncoder。如果计算的是复合数据结构的跨距,不要取该结构中某个元素的跨距乘以元素的数量。跨出整个数据结构的一大步。

关于swift - 什么会导致对MetalKit MTKView的draw()函数的定期调用出现滞后,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47604638/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com