gpt4 book ai didi

c# - 如何优化 C# 中数组的复制 block ?

转载 作者:可可西里 更新时间:2023-11-01 03:10:02 26 4
gpt4 key购买 nike

我正在编写一个实时视频成像应用程序,需要加快此方法的速度。目前执行大约需要 10 毫秒,我希望将其缩短至 2-3 毫秒。

我已经尝试了 Array.Copy 和 Buffer.BlockCopy,它们都需要大约 30 毫秒,比手动复制长 3 倍。

一种想法是以某种方式将 4 个字节复制为一个整数,然后将它们作为一个整数粘贴,从而将 4 行代码减少为一行代码。但是,我不确定该怎么做。

另一个想法是以某种方式使用指针和不安全代码来做到这一点,但我也不确定该怎么做。

非常感谢所有帮助。谢谢!

编辑:数组大小为:inputBuffer[327680]、lookupTable[16384]、outputBuffer[1310720]

public byte[] ApplyLookupTableToBuffer(byte[] lookupTable, ushort[] inputBuffer)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();

// Precalculate and initialize the variables
int lookupTableLength = lookupTable.Length;
int bufferLength = inputBuffer.Length;
byte[] outputBuffer = new byte[bufferLength * 4];
int outIndex = 0;
int curPixelValue = 0;

// For each pixel in the input buffer...
for (int curPixel = 0; curPixel < bufferLength; curPixel++)
{
outIndex = curPixel * 4; // Calculate the corresponding index in the output buffer
curPixelValue = inputBuffer[curPixel] * 4; // Retrieve the pixel value and multiply by 4 since the lookup table has 4 values (blue/green/red/alpha) for each pixel value

// If the multiplied pixel value falls within the lookup table...
if ((curPixelValue + 3) < lookupTableLength)
{
// Copy the lookup table value associated with the value of the current input buffer location to the output buffer
outputBuffer[outIndex + 0] = lookupTable[curPixelValue + 0];
outputBuffer[outIndex + 1] = lookupTable[curPixelValue + 1];
outputBuffer[outIndex + 2] = lookupTable[curPixelValue + 2];
outputBuffer[outIndex + 3] = lookupTable[curPixelValue + 3];

//System.Buffer.BlockCopy(lookupTable, curPixelValue, outputBuffer, outIndex, 4); // Takes 2-10x longer than just copying the values manually
//Array.Copy(lookupTable, curPixelValue, outputBuffer, outIndex, 4); // Takes 2-10x longer than just copying the values manually
}
}

Debug.WriteLine("ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
return outputBuffer;
}

编辑:我更新了保持相同变量名称的方法,以便其他人可以看到代码将如何根据下面的 HABJAN 解决方案进行转换。

    public byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();

// Precalculate and initialize the variables
int lookupTableLength = lookupTable.Length;
int bufferLength = inputBuffer.Length;
byte[] outputBuffer = new byte[bufferLength * 4];
//int outIndex = 0;
int curPixelValue = 0;

unsafe
{
fixed (byte* pointerToOutputBuffer = &outputBuffer[0])
fixed (byte* pointerToLookupTable = &lookupTable[0])
{
// Cast to integer pointers since groups of 4 bytes get copied at once
uint* lookupTablePointer = (uint*)pointerToLookupTable;
uint* outputBufferPointer = (uint*)pointerToOutputBuffer;

// For each pixel in the input buffer...
for (int curPixel = 0; curPixel < bufferLength; curPixel++)
{
// No need to multiply by 4 on the following 2 lines since the pointers are for integers, not bytes
// outIndex = curPixel; // This line is commented since we can use curPixel instead of outIndex
curPixelValue = inputBuffer[curPixel]; // Retrieve the pixel value

if ((curPixelValue + 3) < lookupTableLength)
{
outputBufferPointer[curPixel] = lookupTablePointer[curPixelValue];
}
}
}
}

Debug.WriteLine("2 ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
return outputBuffer;
}

最佳答案

我做了一些测试,通过使用 RtlMoveMemory API 将我的代码变为不安全状态,我设法达到了最大速度。我发现 Buffer.BlockCopyArray.Copy 比直接使用 RtlMoveMemory 慢得多。

所以,最后你会得到这样的结果:

fixed(byte* ptrOutput= &outputBufferBuffer[0])
{
MoveMemory(ptrOutput, ptrInput, 4);
}

[DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)]
private static unsafe extern void MoveMemory(void* dest, void* src, int size);

编辑:

好的,现在当我弄清楚您的逻辑并进行一些测试时,我设法将您的方法的速度提高了将近 50%。由于您需要复制一个小数据 block (总是 4 个字节),是的,您是对的,RtlMoveMemory 在这里无济于事,最好将数据复制为整数。这是我想出的最终解决方案:

public static byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer)
{
int lookupTableLength = lookupTable.Length;
int bufferLength = inputBuffer.Length;
byte[] outputBuffer = new byte[bufferLength * 4];
int outIndex = 0, curPixelValue = 0;

unsafe
{
fixed (byte* ptrOutput = &outputBuffer[0])
fixed (byte* ptrLookup = &lookupTable[0])
{
uint* lkp = (uint*)ptrLookup;
uint* opt = (uint*)ptrOutput;

for (int index = 0; index < bufferLength; index++)
{
outIndex = index;
curPixelValue = inputBuffer[index];

if ((curPixelValue + 3) < lookupTableLength)
{
opt[outIndex] = lkp[curPixelValue];
}
}
}
}

return outputBuffer;
}

我将您的方法重命名为 ApplyLookupTableToBufferV1

这是我的测试结果:

int tc1 = Environment.TickCount;

for (int i = 0; i < 200; i++)
{
byte[] a = ApplyLookupTableToBufferV1(lt, ib);
}

tc1 = Environment.TickCount - tc1;

Console.WriteLine("V1: " + tc1.ToString() + "ms");

结果 - V1:998 毫秒

int tc2 = Environment.TickCount;

for (int i = 0; i < 200; i++)
{
byte[] a = ApplyLookupTableToBufferV2(lt, ib);
}

tc2 = Environment.TickCount - tc2;

Console.WriteLine("V2: " + tc2.ToString() + "ms");

结果 - V2:473 毫秒

关于c# - 如何优化 C# 中数组的复制 block ?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21097931/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com