gpt4 book ai didi

c# - 如何有效地交叉引用 2 个文本文件? |改进我的代码

转载 作者:太空宇宙 更新时间:2023-11-03 13:09:12 26 4
gpt4 key购买 nike

下面概述了我的代码的作用:

  • 读取 150k 行的 TextFileA。
  • 读取 TextFileB,它有 150k 行,是 TextFileA 的交叉引用列表。
  • . 拆分两个文本文件并匹配指定的元素。
  • 最后,输出第三个文本文件,其中包含来自 TextFileA 和 TextFileB 的值。

下面的代码运行良好,直到大约 13,000 行,然后程序变得非常慢。

有人能解释一下为什么程序会以指数方式变慢吗?我该如何改进这段代码?谢谢。

private void BT_Xref_Click(object sender, EventArgs e)
{
//grabs file path from text box
string ManifestPath = TB_Manifest.Text;
//grabs parent directory from file path
string directoryName = Path.GetDirectoryName(ManifestPath);
//creates a new folder for the final output text file
string pathString = Path.Combine(directoryName, "Final Index");
Directory.CreateDirectory(pathString);
//list for matching text lines which will eventually be output to the final text file
List<string> NewData = new List<string>();

//initializes StreamReader for the first text file
StreamReader ManifestReader = new StreamReader(ManifestPath);
String[] ManifestArray = File.ReadAllLines(ManifestPath);
List<string> RemoveManifest = new List<string>(ManifestArray);
//initializes StreamReader for the second text file
StreamReader OutputReader = new StreamReader(TB_Complete.Text);
String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
List<string> RemoveOutput = new List<string>(OutputArray);

//initializes a count which decides at what point a text file should be created
int shortcount = 0;
//.ReadLine is initialized to ignore the first line in both text files
string ManifestLine = ManifestReader.ReadLine();
string OutputLine = OutputReader.ReadLine();

foreach (string mfile in ManifestArray)
{
ManifestLine = ManifestReader.ReadLine();
string ManifestElement = ManifestLine.Split(',')[6];
string ManifestElement2 = ManifestLine.Split(',')[5];
//value to be retreived and output to final text file
string ManifestElementDate = ManifestElement2.Replace("/", "-");
//value to be compared with the other text file
string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");
//resets OutpuReader reader to ensure no lines are being skipped
OutputReader.BaseStream.Position = 0;

//counting the mfile position in the ManifestArray
//int removeIndex = Array.IndexOf(ManifestArray, mfile);
//remove by resising the array
//Array.Resize(ref ManifestArray, ManifestArray.Length - 1);

foreach (string ofile in OutputArray)
{
OutputLine = OutputReader.ReadLine();
//value to be comapred with other text file
string OutputElement = OutputLine.Split('|')[2];
//if values equal then add the specified line of text to the list.
if (ManifestNoExt.Equals(OutputElement))
{
NewData.Add(OutputLine + "|" + ManifestElementDate);
RemoveManifest.RemoveAll(item => item == ManifestLine);

if (NewData.Count == 1000)
{
//if youve reached the count then output files into a new text file
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
NewData.Clear();
}
break;
}
}
}
//once all line of text have been searched combine all text files in directory
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
{
foreach (var file in SplitTextFiles)
{
using (var input = File.OpenRead(file))
{
input.CopyTo(FinalIndexFile);
}
File.Delete(file);
}
}
//File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, @"*.txt").SelectMany(file => File.ReadLines(file)));
}

最佳答案

你这里有一个 O(nm) 算法,假设 n 和 m 相同,它实际上是一个 O(n^2)。这不太好,这就是为什么它的速度变慢了(对于每个文件中的 150k 行,您正在查看内部循环的 22500000000 次迭代。不完全确定您的代码试图做什么,但基于条件 如果 (ManifestNoExt.Equals(OutputElement)),我认为您可以按如下方式大幅降低复杂性:

读入TextFileA,将值存入一个以ManifestNoExt为Key,mFile为值的Dictionary。

接下来读取 TextFileB 并遍历 B 中的所有行,并在构建的字典中进行查找。

这将为您提供一个复杂度为 O(n) + O(m) 的算法,该算法速度很快。

此外,我不确定您为什么要读取整个文件,然后在循环内再次读取它们(ManifestArray 和 OutputArray 的内容与文件相同)。这当然也是导致速度变慢的一个原因,因为您最终将重创文件系统。

这个想法的一个完全未经测试的版本:

private void BT_Xref_Click(object sender, EventArgs e)
{
//grabs file path from text box
string ManifestPath = TB_Manifest.Text;
//grabs parent directory from file path
string directoryName = Path.GetDirectoryName(ManifestPath);
//creates a new folder for the final output text file
string pathString = Path.Combine(directoryName, "Final Index");
Directory.CreateDirectory(pathString);
//list for matching text lines which will eventually be output to the final text file
List<string> NewData = new List<string>();

String[] ManifestArray = File.ReadAllLines(ManifestPath);
List<string> RemoveManifest = new List<string>(ManifestArray);
String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
List<string> RemoveOutput = new List<string>(OutputArray);

//initializes a count which decides at what point a text file should be created
int shortcount = 0;
//.ReadLine is initialized to ignore the first line in both text files
string ManifestLine = ManifestReader.ReadLine();
string OutputLine = OutputReader.ReadLine();

Dictionary<string, Tuple<string, string>> ManifestMap = new Dictionary<string, Tuple<string, string>>();

foreach (string mfile in ManifestArray.Skip(1))
{
string ManifestLine = mfile;
string ManifestElement = ManifestLine.Split(',')[6];
string ManifestElement2 = ManifestLine.Split(',')[5];
//value to be retreived and output to final text file
string ManifestElementDate = ManifestElement2.Replace("/", "-");
//value to be compared with the other text file
string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");

ManifestMap.Add(ManifestNoExt, Tuple.Create(ManifestElementDate, ManifestLine));

//counting the mfile position in the ManifestArray
//int removeIndex = Array.IndexOf(ManifestArray, mfile);
//remove by resising the array
//Array.Resize(ref ManifestArray, ManifestArray.Length - 1);
}

foreach (string ofile in OutputArray.Skip(1))
{
//value to be compared with other text file
string OutputElement = OutputLine.Split('|')[2];
//if values equal then add the specified line of text to the list.
if (ManifestMap.ContainsKey(OutputElement))
{
NewData.Add(OutputLine + "|" + ManifestMap[OutputElement].Item1);
RemoveManifest.RemoveAll(item => item == ManifestMap[OutputElement].Item2);

if (NewData.Count == 1000)
{
//if youve reached the count then output files into a new text file
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
NewData.Clear();
}
break;
}
}

//once all line of text have been searched combine all text files in directory
shortcount = shortcount + 1;
File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
{
foreach (var file in SplitTextFiles)
{
using (var input = File.OpenRead(file))
{
input.CopyTo(FinalIndexFile);
}
File.Delete(file);
}
}
//File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, @"*.txt").SelectMany(file => File.ReadLines(file)));
}

关于c# - 如何有效地交叉引用 2 个文本文件? |改进我的代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29785351/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com