c# - 如何有效地交叉引用 2 个文本文件？ |改进我的代码-6ren

c# - 如何有效地交叉引用 2 个文本文件？ |改进我的代码

转载作者：太空宇宙更新时间：2023-11-03 13:09:12

下面概述了我的代码的作用:

读取 150k 行的 TextFileA。
读取 TextFileB，它有 150k 行，是 TextFileA 的交叉引用列表。
. 拆分两个文本文件并匹配指定的元素。
最后，输出第三个文本文件，其中包含来自 TextFileA 和 TextFileB 的值。

下面的代码运行良好，直到大约 13,000 行，然后程序变得非常慢。

有人能解释一下为什么程序会以指数方式变慢吗？我该如何改进这段代码？谢谢。

private void BT_Xref_Click(object sender, EventArgs e)
    {
        //grabs file path from text box
        string ManifestPath = TB_Manifest.Text;
        //grabs parent directory from file path
        string directoryName = Path.GetDirectoryName(ManifestPath);
        //creates a new folder for the final output text file
        string pathString = Path.Combine(directoryName, "Final Index");
        Directory.CreateDirectory(pathString);
        //list for matching text lines which will eventually be output to the final text file
        List<string> NewData = new List<string>();

        //initializes StreamReader for the first text file
        StreamReader ManifestReader = new StreamReader(ManifestPath);
        String[] ManifestArray = File.ReadAllLines(ManifestPath);
        List<string> RemoveManifest = new List<string>(ManifestArray);
        //initializes StreamReader for the second text file
        StreamReader OutputReader = new StreamReader(TB_Complete.Text);
        String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
        List<string> RemoveOutput = new List<string>(OutputArray);

        //initializes a count which decides at what point a text file should be created
        int shortcount = 0;
        //.ReadLine is initialized to ignore the first line in both text files
        string ManifestLine = ManifestReader.ReadLine();
        string OutputLine = OutputReader.ReadLine();

        foreach (string mfile in ManifestArray)
        {
            ManifestLine = ManifestReader.ReadLine();
            string ManifestElement = ManifestLine.Split(',')[6];
            string ManifestElement2 = ManifestLine.Split(',')[5];
            //value to be retreived and output to final text file
            string ManifestElementDate = ManifestElement2.Replace("/", "-");
            //value to be compared with the other text file
            string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");
            //resets OutpuReader reader to ensure no lines are being skipped
            OutputReader.BaseStream.Position = 0;

            //counting the mfile position in the ManifestArray
            //int removeIndex = Array.IndexOf(ManifestArray, mfile);
            //remove by resising the array
            //Array.Resize(ref ManifestArray, ManifestArray.Length - 1);

            foreach (string ofile in OutputArray)
            {
                OutputLine = OutputReader.ReadLine();
                //value to be comapred with other text file
                string OutputElement = OutputLine.Split('|')[2];
                //if values equal then add the specified line of text to the list.
                if (ManifestNoExt.Equals(OutputElement))
                {
                    NewData.Add(OutputLine + "|" + ManifestElementDate);
                    RemoveManifest.RemoveAll(item => item == ManifestLine);

                    if (NewData.Count == 1000)
                    {
                        //if youve reached the count then output files into a new text file
                        shortcount = shortcount + 1;
                        File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
                        NewData.Clear();
                    }
                    break;
                }
            }
        }
        //once all line of text have been searched combine all text files in directory
        shortcount = shortcount + 1;
        File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
        String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
        using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
        {
            foreach (var file in SplitTextFiles)
            {
                using (var input = File.OpenRead(file))
                {
                    input.CopyTo(FinalIndexFile);
                }
                File.Delete(file);
            }
        }
        //File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, @"*.txt").SelectMany(file => File.ReadLines(file)));
    }

最佳答案

你这里有一个 O(nm) 算法，假设 n 和 m 相同，它实际上是一个 O(n^2)。这不太好，这就是为什么它的速度变慢了(对于每个文件中的 150k 行，您正在查看内部循环的 22500000000 次迭代。不完全确定您的代码试图做什么，但基于条件 如果 (ManifestNoExt.Equals(OutputElement))，我认为您可以按如下方式大幅降低复杂性:

读入TextFileA，将值存入一个以ManifestNoExt为Key，mFile为值的Dictionary。

接下来读取 TextFileB 并遍历 B 中的所有行，并在构建的字典中进行查找。

这将为您提供一个复杂度为 O(n) + O(m) 的算法，该算法速度很快。

此外，我不确定您为什么要读取整个文件，然后在循环内再次读取它们(ManifestArray 和 OutputArray 的内容与文件相同)。这当然也是导致速度变慢的一个原因，因为您最终将重创文件系统。

这个想法的一个完全未经测试的版本:

private void BT_Xref_Click(object sender, EventArgs e)
{
    //grabs file path from text box
    string ManifestPath = TB_Manifest.Text;
    //grabs parent directory from file path
    string directoryName = Path.GetDirectoryName(ManifestPath);
    //creates a new folder for the final output text file
    string pathString = Path.Combine(directoryName, "Final Index");
    Directory.CreateDirectory(pathString);
    //list for matching text lines which will eventually be output to the final text file
    List<string> NewData = new List<string>();

    String[] ManifestArray = File.ReadAllLines(ManifestPath);
    List<string> RemoveManifest = new List<string>(ManifestArray);
    String[] OutputArray = File.ReadAllLines(TB_Complete.Text);
    List<string> RemoveOutput = new List<string>(OutputArray);

    //initializes a count which decides at what point a text file should be created
    int shortcount = 0;
    //.ReadLine is initialized to ignore the first line in both text files
    string ManifestLine = ManifestReader.ReadLine();
    string OutputLine = OutputReader.ReadLine();

    Dictionary<string, Tuple<string, string>> ManifestMap = new Dictionary<string, Tuple<string, string>>();

    foreach (string mfile in ManifestArray.Skip(1))
    {
        string ManifestLine = mfile;
        string ManifestElement = ManifestLine.Split(',')[6];
        string ManifestElement2 = ManifestLine.Split(',')[5];
        //value to be retreived and output to final text file
        string ManifestElementDate = ManifestElement2.Replace("/", "-");
        //value to be compared with the other text file
        string ManifestNoExt = Regex.Replace(ManifestElement, ("(\\.\\w+$)"),"");

        ManifestMap.Add(ManifestNoExt, Tuple.Create(ManifestElementDate, ManifestLine));

        //counting the mfile position in the ManifestArray
        //int removeIndex = Array.IndexOf(ManifestArray, mfile);
        //remove by resising the array
        //Array.Resize(ref ManifestArray, ManifestArray.Length - 1);
    }

    foreach (string ofile in OutputArray.Skip(1))
    {
        //value to be compared with other text file
        string OutputElement = OutputLine.Split('|')[2];
        //if values equal then add the specified line of text to the list.
        if (ManifestMap.ContainsKey(OutputElement))
        {
            NewData.Add(OutputLine + "|" + ManifestMap[OutputElement].Item1);
            RemoveManifest.RemoveAll(item => item == ManifestMap[OutputElement].Item2);

            if (NewData.Count == 1000)
            {
                //if youve reached the count then output files into a new text file
                shortcount = shortcount + 1;
                File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
                NewData.Clear();
            }
            break;
        }
    }

    //once all line of text have been searched combine all text files in directory
    shortcount = shortcount + 1;
    File.WriteAllLines(pathString + "\\test" + shortcount + ".txt", NewData);
    String[] SplitTextFiles = Directory.GetFiles(pathString, "*.*", SearchOption.AllDirectories);
    using (var FinalIndexFile = File.Create(pathString + "\\FinalIndex.txt"))
    {
        foreach (var file in SplitTextFiles)
        {
            using (var input = File.OpenRead(file))
            {
                input.CopyTo(FinalIndexFile);
            }
            File.Delete(file);
        }
    }
    //File.WriteAllLines("\\test.txt", Directory.EnumerateFiles(pathString, @"*.txt").SelectMany(file => File.ReadLines(file)));
}

关于c# - 如何有效地交叉引用 2 个文本文件？ |改进我的代码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29785351/

文章推荐： bash - 使用 Bash 确定 URL 是 HTTP 还是 HTTPS

文章推荐： android - 房间 - SQLiteLog : (1) too many SQL variables

文章推荐： ssl - s_client 和 gethostbyname 失败

c++ - 将函数作为参数传递的良好做法 : copy, 引用，const 引用？
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: template pass by value or const reference or…? 以下对于将函数
C++ 重载运算符两次，一次返回非 const 引用，另一次返回 const 引用，偏好是什么？
我用相同的参数列表重载了一个运算符两次。但返回类型不同: T& operator()(par_list){blablabla} const T& operator()(par_list){bla
java - 如果 ViewModel 持有此 Activity 实现的接口(interface)引用，GC 是否会收集 Activity 引用？
假设我有实现接口(interface) I 的 Activity A。我的 ViewModel 类 (VM) 持有对实现接口(interface) I 的对象的引用: class A extends
PHP 引用 `$this`
PHP 如何解释 &$this ？为什么允许？我遇到了以下问题，这看起来像是 PHP 7.1 和 7.2 中的错误。它与 &$this 引用和跨命名空间调用以及 call_user_func_arr
引用 Php
谁能解释一下下面“&”的作用: class TEST { } $abc =& new TEST(); 我知道这是引用。但是有人可以说明我为什么以及什么时候需要这样的东西吗？或者给我指向一个对此有很好解
详解C++ 引用
引用变量是一个别名，也就是说，它是某个已存在变量的另一个名字。一旦把引用初始化为某个变量，就可以使用该引用名称或变量名称来指向变量。 C++ 引用 vs 指针引用很容易与指针混淆，它们之间有三
解析C++引用
目录引言背景结论引言我选择写C++中的引用是因为我感觉大多数人误解了引用。而我之所以有这个感受是因为我主持过很多C++的面试，并且我很少
16、Perl 引用
Perl 中的引用是指一个标量类型可以指向变量、数组、哈希表（也叫关联数组）甚至函数，可以应用在程序的任何地方创建引用定义变量的时候，在变量名前面加个 \，就得到了这个变量的一个引用 $sc
Perl，通过调用其父程序覆盖子程序 |引用
我编写了一个将从主脚本加载的 Perl 模块。该模块使用在主脚本中定义的子程序(我不是维护者)。对于主脚本中的一个子例程，需要扩展，但我不想修补主脚本。相反，我想覆盖我的模块中的函数并保存对原始子例
F# 引用 - 遍历由值表示的函数调用
我花了几个小时试图掌握 F# Quotations，但我遇到了一些障碍。我的要求是从可区分的联合类型中取出简单的函数(只是整数、+、-、/、*)并生成一个表达式树，最终将用于生成 C 代码。我知道使用
regex - 引用 - 密码验证
很多时候，问题(尤其是那些标记为 regex 的问题)询问验证密码的方法。似乎用户通常会寻求密码验证方法，包括确保密码包含特定字符、匹配特定模式和/或遵守最少字符数。这篇文章旨在帮助用户找到合适的密码
excel - 引用公式中的单元格地址/引用
我想通过 MIN 函数内的地址(例如，C800)引用包含文本的最后一个单元格。你能帮忙吗？ Sub Set_Formula() ' ----------------------------- Dim
for-of 循环中的 Javascript 引用
使用常规的 for 循环，我可以做类似的事情: for (let i = 0; i < objects.length; i++) { delete objects[i]; } 常规的 for-
cucumber :引用/不引用参数的最佳实践是什么
在 Cucumber 中，您定义了定义 BDD 语法的步骤；例如，您的测试可能有: When I navigate to step 3 然后你可以定义一个步骤: When /^I navigate t
linq - 表达式类型.引用
这是什么UnaryExpression的目的，以及应该怎样使用？最佳答案它需要一个 Expression对象并用另一个 Expression 包裹它.例如，如果您有一个用于 lambda 的表达式
JQuery 多个选择器，$(this) 引用？
给出以下内容 $("#identifier div:first, #idetifier2").fadeOut(300,function() { // I need to reference jus
xslt - XPath 引用
我不知道我要找的东西的正确术语，但我要找的是一个完整的引用，可以放在双引号之间的语句，比如 *， node()、@* 以及所有列出的 here加上任何其他存在的。我链接到的答案提供了一些细节，但还
regex - 引用-此正则表达式是什么意思？
This question's answers are a community effort。编辑现有答案以改善此职位。它当前不接受新的答案或互动。这是什么？这是常见问答的集合。这也是一个社区Wi
accessibility - Microsoft的UI自动化的教程/引用
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。想改善这个问题吗？更新问题，以便将其作为on-topic
rust - 引用“静态生命周期不长？
考虑下一个代码: fn get_ref(slice: &'a Vec, f: fn(&'a Vec) -> R) -> R where R: 'a, { f(slice) } fn m

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c# - 如何有效地交叉引用 2 个文本文件？ |改进我的代码