gpt4 book ai didi

c# - 读取大型文本文件(超过 400 万行)并在 .NET 中解析每一行

转载 作者:太空宇宙 更新时间:2023-11-03 18:55:32 25 4
gpt4 key购买 nike

我每个月的每一天都有一个日志文件。这些文件是纯文本,每行都有一些信息,如下面的代码片段:

1?2017-06-01T00:00:00^148^3
2?myVar1^3454.33
2?myVar2^35
2?myVar3^0
1?2017-06-01T00:00:03^148^3
...

为了处理和显示这些数据,我正在开发一个 WPF 应用程序来读取这些 txt 文件、解析行并将这些数据保存在 SQLite 数据库中。然后,我允许用户进行一些基本的数学运算,例如子集的 AVG。

由于这些文件太大(每个文件超过 300mb 和 400 万行),我正在努力解决 ProcessLine 方法中的内存使用问题(据我所知,读取部分还可以现在)。该方法永远不会完成,应用程序会自行进入中断模式。

我的代码:

private bool ParseContent(string filePath)
{
if (string.IsNullOrEmpty(FilePath) || !File.Exists(FilePath))
return false;

string logEntryDateTimeTemp = string.Empty;

string [] AllLines = new string[5000000]; //only allocate memory here
AllLines = File.ReadAllLines(filePath);
Parallel.For(0, AllLines.Length, x =>
{
ProcessLine(AllLines[x], ref logEntryDateTimeTemp);
});

return true;
}

void ProcessLine(string line, ref string logEntryDateTimeTemp)
{
if (string.IsNullOrEmpty(line))
return;

var logFields = line.Split(_delimiterChars);

switch (logFields[0])
{
case "1":
logEntryDateTimeTemp = logFields[1];
break;
case "2":
LogEntries.Add(new LogEntry
{
Id = ItemsCount + 1,
CurrentDateTime = logEntryDateTimeTemp,
TagAddress = logFields[1],
TagValue = Convert.ToDecimal(logFields[2])
});

ItemsCount++;
break;
default:
break;
}
}

有更好的方法吗?

OBS:我还测试了另外两种读取文件的方法,它们是:

        #region StreamReader
//using (StreamReader sr = File.OpenText(filePath))
//{
// string line = String.Empty;
// while ((line = sr.ReadLine()) != null)
// {
// if (string.IsNullOrEmpty(line))
// break;

// var logFields = line.Split(_delimiterChars);

// switch (logFields[0])
// {
// case "1":
// logEntryDateTimeTemp = logFields[1];
// break;
// case "2":
// LogEntries.Add(new LogEntry
// {
// Id = ItemsCount + 1,
// CurrentDateTime = logEntryDateTimeTemp,
// TagAddress = logFields[1],
// TagValue = Convert.ToDecimal(logFields[2])
// });

// ItemsCount++;
// break;
// default:
// break;
// }
// }
//}
#endregion

#region ReadLines
//var lines = File.ReadLines(filePath, Encoding.UTF8);

//foreach (var line in lines)
//{
// if (string.IsNullOrEmpty(line))
// break;

// var logFields = line.Split(_delimiterChars);

// switch (logFields[0])
// {
// case "1":
// logEntryDateTimeTemp = logFields[1];
// break;
// case "2":
// LogEntries.Add(new LogEntry
// {
// Id = ItemsCount + 1,
// CurrentDateTime = logEntryDateTimeTemp,
// TagAddress = logFields[1],
// TagValue = Convert.ToDecimal(logFields[2])
// });

// ItemsCount++;
// break;
// default:
// break;
// }
//}
#endregion

OBS2:我使用的是Visual Studio 2017,当应用程序运行在debug模式时,应用程序突然进入break模式,Output窗口的信息如下:

The CLR has been unable to transition from COM context 0xb545a8 to COM context 0xb544f0 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.

最佳答案

尝试使用 StreamReader 而不是一次将整个文件加载到内存中:

using (System.IO.StreamReader sr = new System.IO.StreamReader(filePath))
{
string line;
while ((line = sr.ReadLine()) != null)
{
//..
}
}

关于c# - 读取大型文本文件(超过 400 万行)并在 .NET 中解析每一行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46407477/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com