gpt4 book ai didi

c# - 如何从文件中剪切数据?

转载 作者:太空宇宙 更新时间:2023-11-03 12:08:30 25 4
gpt4 key购买 nike

我有一个不大的文件,大约 140MB,它包含一段时间内的一些 CAN 数据,总持续时间约为 29:17:00 [mm:ss:ms]。我需要的是拆分该文件或更好地将一些数据复制到一个特定持续时间的新文件中。

比如说从 10:00:0020:30:00

有什么想法吗?如何接近?

到目前为止我为阅读标题所做的工作:

private void test(string fileName)
{
FileStream fs;

fs = File.OpenRead(fileName);
long fileSize = fs.Length;
bool extendedFileFormat = DriveRecFiles.IsFileDRX(replayCtrl.SourceFilename);

Int64 tmpByte = 0;
Int64 tmpInt64 = 0;

#region TimeStampFrequency
for (int i = 0; i < 8; i++)
{
tmpByte = fs.ReadByte();
tmpInt64 += tmpByte << i * 8;
}
SourceTimingClockFrequency = tmpInt64;
#endregion

#region StarTimeStamp
tmpInt64 = 0;
for (int i = 0; i < 8; i++)
{
tmpByte = fs.ReadByte();
tmpInt64 += tmpByte << i * 8;
}
sourceTimingBeginStampValue = tmpInt64;
#endregion

#region Last TimeStamp
fs.Position = fs.Length - 8;
tmpInt64 = 0;
for (int i = 0; i < 8; i++)
{
tmpByte = fs.ReadByte();
tmpInt64 += tmpByte << i * 8;
}
TimeStampEnd = tmpInt64;

//This is the conversation from TimeStamp to Time in ms
int FileLengthTime = (int)((1000 * (TimeStampEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
#endregion

}

现在我被卡住了,我不知道如何处理,我是否应该通过 for 循环比较每个时间戳和每个 like:

假设我设置了开始时间 1000000ms 和结束时间 1700000ms

int begintime = 1000000
int endtime = 1700000
int startPosition = 0
int endPosition = 0
long currentTimeStepEnd = 0;
int currentTime = 0;
for (int i = 8; i <= fs.Length - 8 ; i++)
{
fs.position = i
tmpInt64 = 0;
for(int i = 0; i < 8; i++)
{
tmpByte = fs.ReadByte();
tmpInt64 += tmpByte << i * 8;
}
currentTimeStepEnd = tmpInt64;
currentTime = (int)((1000 * (CurrentTimeStepEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
if(startPosition = 0) int start = currentTime.CompareTo(begintime)
if(endPosition = 0) int end = currentTime.CompareTo(endtime)
if (start == 0) startPosition = i;
if (end == 0) endPosition = i
if ((startPosition != 0) & (endPosition != 0)) break;
i += 47;
}

然后将结果复制到文件。

我不知道这是不是最好的方法。其次,我想制作一个用于开始时间的 slider 和一个用于结束时间的 slider ,步长为 1ms我认为上述方法每次比较新 slider 值与当前时间戳等时效率不高。每次打开和关闭 fs?

最佳答案

这是部分答案。我可以逐 block 读取您的数据。一旦你得到它,然后你可以决定将它写回一组较小的文件(使用 FileStreams 上的 BinaryWriters)。我会把它留给你。但是,这会读取所有内容。

更新:下面有更多答案(我添加了 WriteStruct 方法,以及更接近您要求的内容)

我首先定义了两个布局非常清晰的结构。由于 header 仅包含两个连续的 64 位单位,因此我可以只使用 LayoutKind.Sequential:

[StructLayout(LayoutKind.Sequential)]
public struct CanHeader {
public UInt64 TimeStampFrequency;
public UInt64 TimeStamp;
}

但是,Chunk 结构混合并匹配 32 位和 64 位单位。如果我按顺序对其进行布局,框架会插入 4 个字节的填充以对齐 UInt64。所以,我需要使用 LayoutKind.Explicit:

[StructLayout(LayoutKind.Explicit)]
public struct CanChunk {
[FieldOffset(0)] public UInt32 ReturnReadValue;
[FieldOffset(4)] public UInt32 CanTime;
[FieldOffset(8)] public UInt32 Can;
[FieldOffset(12)] public UInt32 Ident;
[FieldOffset(16)] public UInt32 DataLength;
[FieldOffset(20)] public UInt64 Data;
[FieldOffset(28)] public UInt32 Res;
[FieldOffset(32)] public UInt64 TimeStamp;
}

然后我看了看@FelixK对C# array within a struct的回答,并修改了他的 ReadStruct 扩展方法以满足我的需要:

private static (T, bool) ReadStruct<T>(this BinaryReader reader) where T : struct {
var len = Marshal.SizeOf(typeof(T));
Byte[] buffer = reader.ReadBytes(len);

if (buffer.Length < len) {
return (default(T), false);
}
//otherwise
GCHandle handle = default(GCHandle);
try {
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
return ((T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T)), true);
} finally {
if (handle.IsAllocated)
handle.Free();
}
}

它返回一个元组,其中第一个成员是刚刚从文件中读取的结构实例,第二个成员是一个标志,指示是否需要更多读取(true 表示“继续读取”)。它还使用 BinaryReader.ReadBytes,而不是 BinaryReader.Read

所有这些都准备就绪后,现在我可以读取数据了。我的第一次尝试是将内容写到控制台 - 但写出 140 MB 需要很长时间。但是,如果您这样做,您会看到数据按照您预期的方式移动(时间戳不断增加)。

public static void ReadBinary() {
using (var stream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
using (var reader = new BinaryReader(stream)) {
var headerTuple = reader.ReadStruct<CanHeader>();
Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016} TimeStamp: {headerTuple.Item1.TimeStamp:x016}");;
bool stillWorking;
UInt64 totalSize = 0L;
var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
do {
var chunkTuple = reader.ReadStruct<CanChunk>();
stillWorking = chunkTuple.Item2;
if (stillWorking) {
var chunk = chunkTuple.Item1;
//Console.WriteLine($"{chunk.ReturnReadValue:x08} {chunk.CanTime:x08} {chunk.Can:x08} {chunk.Ident:x08} {chunk.DataLength:x08} {chunk.Data:x016} {chunk.Res:x04} {chunk.TimeStamp:x016}");
totalSize += chunkSize;
}
} while (stillWorking);
Console.WriteLine($"Total Size: 0x{totalSize:x016}");
}
}
}

如果我取消注释 Console.WriteLine 语句,输出开始看起来像这样:

[Header] TimeStampFrequency: 00000000003408e2  TimeStamp: 000002a1a1bf04bb
00000001 a1bf04bb 00000020 000002ff 00000008 0007316be2c20350 0000 000002a1a1bf04bb
00000001 a1bf04be 00000020 00000400 00000008 020a011abf80138e 0000 000002a1a1bf04be
00000001 a1bf04c0 00000020 00000400 00000008 8000115f84f09f12 0000 000002a1a1bf04c0
00000001 a1bf04c2 00000020 00000401 00000008 0c1c1205690d81f8 0000 000002a1a1bf04c2
00000001 a1bf04c3 00000020 00000401 00000007 001fa2420000624d 0000 000002a1a1bf04c3
00000001 a1bf04c5 00000020 00000402 00000008 0c2a5a95b99d0286 0000 000002a1a1bf04c5
00000001 a1bf04c7 00000020 00000402 00000007 001faa6000003c49 0000 000002a1a1bf04c7
00000001 a1bf04c8 00000020 00000403 00000008 0c1c0c06840e02d2 0000 000002a1a1bf04c8
00000001 a1bf04ca 00000020 00000403 00000007 001fad4200006c5d 0000 000002a1a1bf04ca
00000001 a1bf04cc 00000020 00000404 00000008 0c1c0882800b82d8 0000 000002a1a1bf04cc
00000001 a1bf04cd 00000020 00000404 00000007 001fad8200009cd1 0000 000002a1a1bf04cd
00000001 a1bf04cf 00000020 00000405 00000008 0c1c0f04850cc2de 0000 000002a1a1bf04cf
00000001 a1bf04d0 00000020 00000405 00000007 001fada20000766f 0000 000002a1a1bf04d0
00000001 a1bf04d2 00000020 00000406 00000008 0c1bd80c4e13831a 0000 000002a1a1bf04d2
00000001 a1bf04d3 00000020 00000406 00000007 001faf800000505b 0000 000002a1a1bf04d3
00000001 a1bf04d5 00000020 00000407 00000008 0c23d51049974330 0000 000002a1a1bf04d5
00000001 a1bf04d6 00000020 00000407 00000007 001fb02000004873 0000 000002a1a1bf04d6
00000001 a1bf04d8 00000020 00000408 00000008 0c1c0a8490cc44ba 0000 000002a1a1bf04d8
00000001 a1bf04da 00000020 00000408 00000007 001fb762000088bf 0000 000002a1a1bf04da
00000001 a1bf04db 00000020 00000409 00000008 0c1c0603a0cbc4c0 0000 000002a1a1bf04db
00000001 a1bf04df 00000020 00000409 00000007 001fb76000008ee5 0000 000002a1a1bf04df
00000001 a1bf04e0 00000020 0000040a 00000008 0c23f70c5b9544cc 0000 000002a1a1bf04e0
00000001 a1bf04e2 00000020 0000040a 00000007 001fb7820000565f 0000 000002a1a1bf04e2
00000001 a1bf04e3 00000020 0000040b 00000008 0c1bf3049b4cc502 0000 000002a1a1bf04e3
00000001 a1bf04e5 00000020 0000040b 00000007 001fb82200007eab 0000 000002a1a1bf04e5

最后是这样的:

Total Size: 0x00000000085ae0a8

其中十进制数为 140,173,480。这正是我所期望的。

更新:

为了更接近你的要求,我把ReadStruct方法中的代码拿来创建一个相应的WriteStruct方法:

 private static void WriteStruct<T>(this BinaryWriter writer, T obj) where T : struct {
var len = Marshal.SizeOf(typeof(T));
var buffer = new byte[len];

GCHandle handle = default(GCHandle);
try {
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(obj, handle.AddrOfPinnedObject(), false);
} finally {
if (handle.IsAllocated)
handle.Free();
}
writer.Write(buffer);
}

有了它,我还可以修改我的原始代码以读取所有数据,并将选择的部分写入另一个文件。在下面的代码中,我读入了“ block ”,直到 block 上的时间戳可以被 10,000 整除。一旦发生这种情况,我将创建一个新的 CanHeader 结构(我不确定那里应该放什么 - 但你应该是)。然后我创建一个输出 FileStream(即,一个要写入的文件)和一个 BinaryWriter。我将 header 写入 FileSteam,然后将读取的下一个 5000 个 block 写入该文件。在你的情况下,你可以使用 block 流中的数据来决定你想做什么:

    using (var readStream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
using (var reader = new BinaryReader(readStream)) {
var headerTuple = reader.ReadStruct<CanHeader>();
Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016} TimeStamp: {headerTuple.Item1.TimeStamp:x016}"); ;
bool stillWorking;
UInt64 totalSize = 0L;
UInt64 recordCount = 0L;
var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
var chunksWritten = 0;
FileStream writeStream = null;
BinaryWriter writer = null;
var writingChucks = false;
var allDone = false;
try {
do {
var chunkTuple = reader.ReadStruct<CanChunk>();
stillWorking = chunkTuple.Item2;
if (stillWorking) {
var chunk = chunkTuple.Item1;
if (!writingChucks && chunk.CanTime % 10_000 == 0) {
writingChucks = true;
var writeHeader = new CanHeader {
TimeStamp = chunk.TimeStamp,
TimeStampFrequency = headerTuple.Item1.TimeStampFrequency
};
writeStream = new FileStream("Output.dr2", FileMode.Create, FileAccess.Write);
writer = new BinaryWriter(writeStream);
writer.WriteStruct(writeHeader);
}
if (writingChucks && !allDone) {
writer.WriteStruct(chunk);
++chunksWritten;
if (chunksWritten >= 5000) {
allDone = true;
}
}
totalSize += chunkSize;
++recordCount;
}
} while (stillWorking);
} finally {
writer?.Dispose();
writeStream?.Dispose();
}
Console.WriteLine($"Total Size: 0x{totalSize:x016} Record Count: {recordCount} Records Written: {chunksWritten}");
}
}
}

当我完成后,我可以看到 5000 条记录被写入文件(它的长度为 200,016 字节 - 5000 条 40 字节的记录以 16 字节的 header 开头),并且第一条记录的 CanTime 是 0xa3a130d0(或 2,745,250,000 - 即,可被 10,000 整除)。一切都在我的意料之中。

关于c# - 如何从文件中剪切数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53656414/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com