gpt4 book ai didi

c# - 提高大型结构列表的二进制序列化性能

转载 作者:IT王子 更新时间:2023-10-29 04:39:30 27 4
gpt4 key购买 nike

我有一个在 3 个整数中保存 3d 坐标的结构。在测试中,我将 100 万个随机点的列表<>放在一起,然后对内存流使用二进制序列化。

内存流的大小约为 21 MB - 这似乎非常低效,因为 1000000 点 * 3 坐标 * 4 字节应该至少为 11MB

它在我的测试装置上也需要大约 3 秒。

有什么改进性能和/或大小的想法吗?

(如果有帮助,我不必保留 ISerialzable 接口(interface),我可以直接写入内存流)

编辑 - 根据下面的答案,我整理了一个比较 BinaryFormatter、'Raw' BinaryWriter 和 Protobuf 的序列化摊牌

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using ProtoBuf;

namespace asp_heatmap.test
{
[Serializable()] // For .NET BinaryFormatter
[ProtoContract] // For Protobuf
public class Coordinates : ISerializable
{
[Serializable()]
[ProtoContract]
public struct CoOrd
{
public CoOrd(int x, int y, int z)
{
this.x = x;
this.y = y;
this.z = z;
}
[ProtoMember(1)]
public int x;
[ProtoMember(2)]
public int y;
[ProtoMember(3)]
public int z;
}

internal Coordinates()
{
}

[ProtoMember(1)]
public List<CoOrd> Coords = new List<CoOrd>();

public void SetupTestArray()
{
Random r = new Random();
List<CoOrd> coordinates = new List<CoOrd>();
for (int i = 0; i < 1000000; i++)
{
Coords.Add(new CoOrd(r.Next(), r.Next(), r.Next()));
}
}

#region Using Framework Binary Formatter Serialization

void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("Coords", this.Coords);
}

internal Coordinates(SerializationInfo info, StreamingContext context)
{
this.Coords = (List<CoOrd>)info.GetValue("Coords", typeof(List<CoOrd>));
}

#endregion

# region 'Raw' Binary Writer serialization

public MemoryStream RawSerializeToStream()
{
MemoryStream stream = new MemoryStream(Coords.Count * 3 * 4 + 4);
BinaryWriter writer = new BinaryWriter(stream);
writer.Write(Coords.Count);
foreach (CoOrd point in Coords)
{
writer.Write(point.x);
writer.Write(point.y);
writer.Write(point.z);
}
return stream;
}

public Coordinates(MemoryStream stream)
{
using (BinaryReader reader = new BinaryReader(stream))
{
int count = reader.ReadInt32();
Coords = new List<CoOrd>(count);
for (int i = 0; i < count; i++)
{
Coords.Add(new CoOrd(reader.ReadInt32(),reader.ReadInt32(),reader.ReadInt32()));
}
}
}
#endregion
}

[TestClass]
public class SerializationTest
{
[TestMethod]
public void TestBinaryFormatter()
{
Coordinates c = new Coordinates();
c.SetupTestArray();

// Serialize to memory stream
MemoryStream mStream = new MemoryStream();
BinaryFormatter bformatter = new BinaryFormatter();
bformatter.Serialize(mStream, c);
Console.WriteLine("Length : {0}", mStream.Length);

// Now Deserialize
mStream.Position = 0;
Coordinates c2 = (Coordinates)bformatter.Deserialize(mStream);
Console.Write(c2.Coords.Count);

mStream.Close();
}

[TestMethod]
public void TestBinaryWriter()
{
Coordinates c = new Coordinates();
c.SetupTestArray();

MemoryStream mStream = c.RawSerializeToStream();
Console.WriteLine("Length : {0}", mStream.Length);

// Now Deserialize
mStream.Position = 0;
Coordinates c2 = new Coordinates(mStream);
Console.Write(c2.Coords.Count);
}

[TestMethod]
public void TestProtoBufV2()
{
Coordinates c = new Coordinates();
c.SetupTestArray();

MemoryStream mStream = new MemoryStream();
ProtoBuf.Serializer.Serialize(mStream,c);
Console.WriteLine("Length : {0}", mStream.Length);

mStream.Position = 0;
Coordinates c2 = ProtoBuf.Serializer.Deserialize<Coordinates>(mStream);
Console.Write(c2.Coords.Count);
}
}
}

结果 (注意 PB v2.0.0.423 测试版)

                Serialize | Ser + Deserialize    | Size
-----------------------------------------------------------
BinaryFormatter 2.89s | 26.00s !!! | 21.0 MB
ProtoBuf v2 0.52s | 0.83s | 18.7 MB
Raw BinaryWriter 0.27s | 0.36s | 11.4 MB

显然这只是考虑速度/大小,并没有考虑任何其他因素。

最佳答案

使用 BinaryFormatter 的二进制序列化在其生成的字节中包含类型信息。这会占用额外的空间。例如,当您不知道另一端的数据结构时,它会很有用。

在您的情况下,您知道数据在两端的格式是什么,这听起来不会改变。所以你可以写一个简单的编码和解码方法。您的 CoOrd 类也不再需要可序列化。

我会使用 System.IO.BinaryReader 和 System.IO.BinaryWriter ,然后遍历每个 CoOrd 实例并将 X、Y、Z 属性值读/写到流中。假设您的许多数字小于 0x7F 和 0x7FFF,这些类甚至会将您的整数压缩到小于 11MB。

像这样:

using (var writer = new BinaryWriter(stream)) {
// write the number of items so we know how many to read out
writer.Write(points.Count);
// write three ints per point
foreach (var point in points) {
writer.Write(point.X);
writer.Write(point.Y);
writer.Write(point.Z);
}
}

从流中读取:

List<CoOrd> points;
using (var reader = new BinaryReader(stream)) {
var count = reader.ReadInt32();
points = new List<CoOrd>(count);
for (int i = 0; i < count; i++) {
var x = reader.ReadInt32();
var y = reader.ReadInt32();
var z = reader.ReadInt32();
points.Add(new CoOrd(x, y, z));
}
}

关于c# - 提高大型结构列表的二进制序列化性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6478579/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com