gpt4 book ai didi

java - 如何在Java中使用group by聚合CSV数据?

转载 作者:行者123 更新时间:2023-12-01 17:01:51 24 4
gpt4 key购买 nike

假设我有一个应用程序操作日志的以下 CSV 文件。此 csv 可能包含 3 - 4 百万行。

Company, ActionsType, Action
ABC, Downloaded, Tutorial 1
ABC, Watched, Tutorial 2
PQR, Subscribed, Tutorial 1
ABC, Watched, Tutorial 2
PQR, Subscribed, Tutorial 3
XYZ, Subscribed, Tutorial 1
XYZ, Watched, Tutorial 3
PQR, Downloaded, Tutorial 1

有没有办法通过按公司名称分组来聚合这些数据,并使用 Java 将 actionType 计数器显示为如下所示的列?

Company, Downloaded, Watched, Subscribed
ABC, 1, 2, 0
PQR, 1, 0, 2
XYZ, 0, 1, 1

我想过使用 OpenCSV 将 CSV 文件加载到列表中,但是对于包含数百万数据的 csv 文件是否有效?

最佳答案

如果您尝试聚合数据,这绝对是低效的。您应该查看 MapReduce 来聚合大数据。

这是一个不使用 MapReduce 的解决方案:

import java.io.BufferedReader;
import java.io.StringReader;
import java.util.HashMap;

public class CSVMapper {

public String transformCsv (String csvFile) {
return csvMapToString(getCsvMap(csvFile));
}

private HashMap<String, Integer[]> getCsvMap (String csvFile) {
// <K,V> := <Company, [Downloaded, Watched, Subscribed]>
HashMap<String, Integer[]> csvMap = new HashMap<String, Integer[]>();
BufferedReader reader = new BufferedReader(new StringReader(csvFile));
String csvLine;

// Create map
try {
while ((csvLine = reader.readLine()) != null) {
String[] csvColumns = csvLine.split(",");
if (csvColumns.length > 0) {
try {
String company = csvColumns[0].trim();
String actionsType = csvColumns[1].trim();
Integer[] columnValues = csvMap.get(company);

if (columnValues == null) {
columnValues = new Integer[3];
columnValues[0] = columnValues[1] = columnValues[2] = 0;
}
columnValues[0] = columnValues[0] + (actionsType.equals("Downloaded") ? 1 : 0);
columnValues[1] = columnValues[1] + (actionsType.equals("Watched") ? 1 : 0);
columnValues[2] = columnValues[2] + (actionsType.equals("Subscribed") ? 1 : 0);

if (!company.equals("Company"))
csvMap.put(company, columnValues);
}
catch (Exception nfe) {
//TODO: handle NumberFormatException
}
}
}
}
catch (Exception e) {
//TODO: handle IOException
}
return csvMap;
}

private String csvMapToString (HashMap<String, Integer[]> csvMap) {
StringBuilder newCsvFile = new StringBuilder();
newCsvFile.append("Company, Downloaded, Watched, Subscribed\n");
for (String company : csvMap.keySet()) {
Integer[] columnValues = csvMap.get(company);
newCsvFile.append(company +
", " + Integer.toString(columnValues[0]) +
", " + Integer.toString(columnValues[1]) +
", " + Integer.toString(columnValues[2]) + "\n");
}
return newCsvFile.toString();
}

public static void main (String[] args) {
String csvFile = "Company, ActionsType, Action\n" +
"ABC, Downloaded, Tutorial 1\n" +
"ABC, Watched, Tutorial 2\n" +
"PQR, Subscribed, Tutorial 1\n" +
"ABC, Watched, Tutorial 2\n" +
"PQR, Subscribed, Tutorial 3\n" +
"XYZ, Subscribed, Tutorial 1\n" +
"XYZ, Watched, Tutorial 3\n" +
"PQR, Downloaded, Tutorial 1";

System.out.println( (new CSVMapper()).transformCsv(csvFile) );
}
}

关于java - 如何在Java中使用group by聚合CSV数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27167026/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com