gpt4 book ai didi

java - 优化 CSV 解析速度更快

转载 作者:行者123 更新时间:2023-11-30 10:37:09 25 4
gpt4 key购买 nike

我正在研究这个“程序”,它从 2 个大型 csv 文件(逐行)读取数据,比较文件中的数组元素,当找到匹配项时,它将我的必要数据写入第三个文件.我唯一的问题是它非常慢。它每秒读取 1-2 行,考虑到我有数百万条记录,这非常慢。关于如何让它更快的任何想法?这是我的代码:

     public class ReadWriteCsv {

public static void main(String[] args) throws IOException {

FileInputStream inputStream = null;
FileInputStream inputStream2 = null;
Scanner sc = null;
Scanner sc2 = null;
String csvSeparator = ",";
String line;
String line2;
String path = "D:/test1.csv";
String path2 = "D:/test2.csv";
String path3 = "D:/newResults.csv";
String[] columns;
String[] columns2;
Boolean matchFound = false;
int count = 0;
StringBuilder builder = new StringBuilder();

FileWriter writer = new FileWriter(path3);

try {
// specifies where to take the files from
inputStream = new FileInputStream(path);
inputStream2 = new FileInputStream(path2);

// creating scanners for files
sc = new Scanner(inputStream, "UTF-8");

// while there is another line available do:
while (sc.hasNextLine()) {
count++;
// storing the current line in the temporary variable "line"
line = sc.nextLine();
System.out.println("Number of lines read so far: " + count);
// defines the columns[] as the line being split by ","
columns = line.split(",");
inputStream2 = new FileInputStream(path2);
sc2 = new Scanner(inputStream2, "UTF-8");

// checks if there is a line available in File2 and goes in the
// while loop, reading file2
while (!matchFound && sc2.hasNextLine()) {
line2 = sc2.nextLine();
columns2 = line2.split(",");

if (columns[3].equals(columns2[1])) {
matchFound = true;
builder.append(columns[3]).append(csvSeparator);
builder.append(columns[1]).append(csvSeparator);
builder.append(columns2[2]).append(csvSeparator);
builder.append(columns2[3]).append("\n");
String result = builder.toString();
writer.write(result);
}

}
builder.setLength(0);
sc2.close();
matchFound = false;
}

if (sc.ioException() != null) {
throw sc.ioException();

}

} finally {
//then I close my inputStreams, scanners and writer

最佳答案

使用现有的 CSV 库,而不是滚动自己的库。它将比您现在拥有的强大得多。

但是,您的问题不是CSV解析速度,而是您的算法是O(n^2),对于第一个文件中的每一行,您需要扫描第二个文件。这种算法会随着数据量的增加而迅速爆炸,当你有数百万行时,你就会遇到问题。您需要更好的算法。

另一个问题是您在每次扫描时都重新解析第二个文件。你至少应该在程序开始时将它作为 ArrayList 或其他东西读入内存,这样你只需要加载和解析它一次。

关于java - 优化 CSV 解析速度更快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40227280/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com