gpt4 book ai didi

c# - 为什么通过 Pointer 转换结构很慢,而 Unsafe.As 很快?

转载 作者:太空狗 更新时间:2023-10-29 18:24:23 30 4
gpt4 key购买 nike

背景

我想制作一些整数大小的 struct s(即 32 位和 64 位)可以轻松转换为相同大小的原始非托管类型(即 Int32UInt32,特别是对于 32 位大小的结构)。

然后,这些结构将公开用于位操作/索引的其他功能,这些功能在整数类型上不直接可用。基本上,作为一种语法糖,提高可读性和易用性。

然而,重要的部分是性能,因为这种额外的抽象基本上应该有 0 成本(在一天结束时,CPU 应该“看到”与处理原始整数相同的位)。

示例结构

下面只是最基本的 struct我想出了。它不具备所有功能,但足以说明我的问题:

[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32 {
[FieldOffset(3)]
public byte Byte1;
[FieldOffset(2)]
public ushort UShort1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(1)]
public byte Byte3;
[FieldOffset(0)]
public ushort UShort2;
[FieldOffset(0)]
public byte Byte4;

[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(int i) => *(Mask32*)&i;
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe implicit operator Mask32(uint i) => *(Mask32*)&i;
}

测试

我想测试这个结构的性能。特别是我想看看它是否能让我像使用常规按位算术一样快速地获取单个字节:(i >> 8) & 0xFF (例如获取第 3 个字节)。

下面您将看到我提出的基准:

public unsafe class MyBenchmark {

const int count = 50000;

[Benchmark(Baseline = true)]
public static void Direct() {
var j = 0;
for (int i = 0; i < count; i++) {
//var b1 = i.Byte1();
//var b2 = i.Byte2();
var b3 = i.Byte3();
//var b4 = i.Byte4();
j += b3;
}
}


[Benchmark]
public static void ViaStructPointer() {
var j = 0;
int i = 0;
var s = (Mask32*)&i;
for (; i < count; i++) {
//var b1 = s->Byte1;
//var b2 = s->Byte2;
var b3 = s->Byte3;
//var b4 = s->Byte4;
j += b3;
}
}

[Benchmark]
public static void ViaStructPointer2() {
var j = 0;
int i = 0;
for (; i < count; i++) {
var s = *(Mask32*)&i;
//var b1 = s.Byte1;
//var b2 = s.Byte2;
var b3 = s.Byte3;
//var b4 = s.Byte4;
j += b3;
}
}

[Benchmark]
public static void ViaStructCast() {
var j = 0;
for (int i = 0; i < count; i++) {
Mask32 m = i;
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}

[Benchmark]
public static void ViaUnsafeAs() {
var j = 0;
for (int i = 0; i < count; i++) {
var m = Unsafe.As<int, Mask32>(ref i);
//var b1 = m.Byte1;
//var b2 = m.Byte2;
var b3 = m.Byte3;
//var b4 = m.Byte4;
j += b3;
}
}

}

Byte1() , Byte2() , Byte3() , 和 Byte4()只是扩展方法,确实得到内联,并通过按位运算和强制转换简单地获取第 n 个字节:

[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte1(this int it) => (byte)(it >> 24);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte2(this int it) => (byte)((it >> 16) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte3(this int it) => (byte)((it >> 8) & 0xFF);
[DebuggerStepThrough, MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte Byte4(this int it) => (byte)it;

编辑:修复代码以确保实际使用了变量。还注释掉了 4 个变量中的 3 个,以真正测试结构转换/成员访问,而不是实际使用变量。

结果

我在 x64 上优化的发布版本中运行了这些。

Intel Core i7-3770K CPU 3.50GHz (Ivy Bridge), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3410223 Hz, Resolution=293.2360 ns, Timer=TSC
[Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0
DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.6.1086.0


Method | Mean | Error | StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 14.47 us | 0.3314 us | 0.2938 us | 1.00 | 0.00 |
ViaStructPointer | 111.32 us | 0.6481 us | 0.6062 us | 7.70 | 0.15 |
ViaStructPointer2 | 102.31 us | 0.7632 us | 0.7139 us | 7.07 | 0.14 |
ViaStructCast | 29.00 us | 0.3159 us | 0.2800 us | 2.01 | 0.04 |
ViaUnsafeAs | 14.32 us | 0.0955 us | 0.0894 us | 0.99 | 0.02 |

编辑:修复代码后的新结果:

            Method |      Mean |     Error |    StdDev | Scaled | ScaledSD |
------------------ |----------:|----------:|----------:|-------:|---------:|
Direct | 57.51 us | 1.1070 us | 1.0355 us | 1.00 | 0.00 |
ViaStructPointer | 203.20 us | 3.9830 us | 3.5308 us | 3.53 | 0.08 |
ViaStructPointer2 | 198.08 us | 1.8411 us | 1.6321 us | 3.45 | 0.06 |
ViaStructCast | 79.68 us | 1.5478 us | 1.7824 us | 1.39 | 0.04 |
ViaUnsafeAs | 57.01 us | 0.8266 us | 0.6902 us | 0.99 | 0.02 |

问题

基准测试结果让我感到惊讶,这就是为什么我有几个问题:

编辑:更改代码以便实际使用变量后,剩下的问题更少。

  1. 为什么指针的东西这么慢?
  2. 为什么转换花费的时间是基准情况的两倍?隐式/显式运算符不是内联的吗?
  3. 怎么会出现新的System.Runtime.CompilerServices.Unsafe package (v. 4.5.0) 这么快?我认为它至少会涉及一个方法调用...
  4. 更一般地说,我怎样才能制作一个基本上是零成本的结构,它可以简单地充当某些内存的“窗口”或像UInt64这样的大原始类型。以便我可以更有效地操作/读取该内存?这里的最佳做法是什么?

最佳答案

这个问题的答案似乎是,当您使用 Unsafe.As() 时,JIT 编译器可以更好地进行某些优化。 .

Unsafe.As()像这样非常简单地实现:

public static ref TTo As<TFrom, TTo>(ref TFrom source)
{
return ref source;
}

就是这样!

这是我编写的一个测试程序,用于将其与转换进行比较:

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Demo
{
[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 4)]
public struct Mask32
{
[FieldOffset(3)]
public byte Byte1;
[FieldOffset(2)]
public ushort UShort1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(1)]
public byte Byte3;
[FieldOffset(0)]
public ushort UShort2;
[FieldOffset(0)]
public byte Byte4;
}

public static unsafe class Program
{
static int count = 50000000;

public static int ViaStructPointer()
{
int total = 0;

for (int i = 0; i < count; i++)
{
var s = (Mask32*)&i;
total += s->Byte1;
}

return total;
}

public static int ViaUnsafeAs()
{
int total = 0;

for (int i = 0; i < count; i++)
{
var m = Unsafe.As<int, Mask32>(ref i);
total += m.Byte1;
}

return total;
}

public static void Main(string[] args)
{
var sw = new Stopwatch();

sw.Restart();
ViaStructPointer();
Console.WriteLine("ViaStructPointer took " + sw.Elapsed);

sw.Restart();
ViaUnsafeAs();
Console.WriteLine("ViaUnsafeAs took " + sw.Elapsed);
}
}
}

我在我的 PC(x64 发布版本)上得到的结果如下:

ViaStructPointer took 00:00:00.1314279
ViaUnsafeAs took 00:00:00.0249446

如您所见,ViaUnsafeAs确实快多了。

那么让我们看看编译器生成了什么:

public static unsafe int ViaStructPointer()
{
int total = 0;
for (int i = 0; i < Program.count; i++)
{
total += (*(Mask32*)(&i)).Byte1;
}
return total;
}

public static int ViaUnsafeAs()
{
int total = 0;
for (int i = 0; i < Program.count; i++)
{
total += (Unsafe.As<int, Mask32>(ref i)).Byte1;
}
return total;
}

好吧,那里没有什么明显的。但是 IL 呢?

.method public hidebysig static int32 ViaStructPointer () cil managed 
{
.locals init (
[0] int32 total,
[1] int32 i,
[2] valuetype Demo.Mask32* s
)

IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: ldc.i4.0
IL_0003: stloc.1
IL_0004: br.s IL_0017
.loop
{
IL_0006: ldloca.s i
IL_0008: conv.u
IL_0009: stloc.2
IL_000a: ldloc.0
IL_000b: ldloc.2
IL_000c: ldfld uint8 Demo.Mask32::Byte1
IL_0011: add
IL_0012: stloc.0
IL_0013: ldloc.1
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stloc.1

IL_0017: ldloc.1
IL_0018: ldsfld int32 Demo.Program::count
IL_001d: blt.s IL_0006
}

IL_001f: ldloc.0
IL_0020: ret
}

.method public hidebysig static int32 ViaUnsafeAs () cil managed
{
.locals init (
[0] int32 total,
[1] int32 i,
[2] valuetype Demo.Mask32 m
)

IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: ldc.i4.0
IL_0003: stloc.1
IL_0004: br.s IL_0020
.loop
{
IL_0006: ldloca.s i
IL_0008: call valuetype Demo.Mask32& [System.Runtime.CompilerServices.Unsafe]System.Runtime.CompilerServices.Unsafe::As<int32, valuetype Demo.Mask32>(!!0&)
IL_000d: ldobj Demo.Mask32
IL_0012: stloc.2
IL_0013: ldloc.0
IL_0014: ldloc.2
IL_0015: ldfld uint8 Demo.Mask32::Byte1
IL_001a: add
IL_001b: stloc.0
IL_001c: ldloc.1
IL_001d: ldc.i4.1
IL_001e: add
IL_001f: stloc.1

IL_0020: ldloc.1
IL_0021: ldsfld int32 Demo.Program::count
IL_0026: blt.s IL_0006
}

IL_0028: ldloc.0
IL_0029: ret
}

啊哈!这里唯一的区别是:

ViaStructPointer: conv.u
ViaUnsafeAs: call valuetype Demo.Mask32& [System.Runtime.CompilerServices.Unsafe]System.Runtime.CompilerServices.Unsafe::As<int32, valuetype Demo.Mask32>(!!0&)
ldobj Demo.Mask32

从表面上看,您会期望 conv.u比用于 Unsafe.As 的两条指令更快.然而,JIT 编译器似乎能够比单个 conv.u 更好地优化这两条指令。 .

为什么是合理的 - 不幸的是我还没有答案!我几乎可以肯定对 Unsafe::As<>() 的调用正在被JITTER内联,正在被JIT进一步优化。

There is some information about the Unsafe class' optimisations here.

请注意,为 Unsafe.As<> 生成的 IL就是这样:

.method public hidebysig static !!TTo& As<TFrom, TTo> (
!!TFrom& source
) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = (
01 00 00 00
)
IL_0000: ldarg.0
IL_0001: ret
}

现在我想,为什么 JITTER 可以将其优化得如此之好变得更加清楚了。

关于c# - 为什么通过 Pointer 转换结构很慢,而 Unsafe.As 很快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50870942/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com