gpt4 book ai didi

hadoop - 映射器输出为逗号分隔值

转载 作者:行者123 更新时间:2023-12-02 21:17:11 24 4
gpt4 key购买 nike

我有如下的CSV文件。我正在编写mapReduce程序,该程序计算在特定日期最大销量的产品。

CSV Data

为此,映射器的输出应为以下形式:

09/1/2 => [产品1,产品2,产品1,产品2,产品4,.....]

我写了如下的Mapper代码

public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {

String line = value.toString();
String[] arrLine = line.split(",");

String strDateTime = arrLine[0];
String strDate = strDateTime.substring(0, strDateTime.indexOf(" "));
String strProductName = arrLine[1];

Map products = new HashMap<String, String>();
String strProdAdded = null;

if(products.get(strDate)!= null)
{
strProdAdded = products.get(strDate).toString();
strProdAdded += strProductName + ",";
products.put(strDate, strProdAdded);
}else
{
products.put(strDate, strProductName);
}

output.collect(new Text(strDate), new Text(strProductName));
}

但是我无法弄清楚获得所需输出的确切方法如下

09/1/2 => [产品1,产品2,产品1,产品2,产品4,.....]

最佳答案

您将不得不使用cleanup()方法,我添加了System.out语句,以便您可以了解方法中的情况。查看可用于Mapper class here.的可用方法

public static class StackMapper extends Mapper<Object, Text, Text, Text> {

private Map<Text, ArrayList<Text>> products = new HashMap<Text, ArrayList<Text>>();
private ArrayList<Text> p = new ArrayList<Text>();

@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

String line = value.toString();
String[] arrLine = line.split(",");

Text strDate = new Text(arrLine[0].substring(0, arrLine[0].indexOf(" ")));
Text strProductName = new Text(arrLine[1]);

if(products.containsKey(strDate))
{
if(!products.get(strDate).contains(strProductName)) {
System.out.println("has date: " + strDate + " " + strProductName + " not exist, added to list: " + p.toString());
p.add(strProductName);
}
System.out.println("has date: " + strDate + ", " + strProductName + " added to list: " + p.toString());
}else
{
p = new ArrayList<Text>();
p.add(strProductName);

System.out.println("new date: " + strDate + ", " + strProductName + " added to list: " + p.toString());
}

products.put(new Text(strDate), p);
}

@Override
protected void cleanup(Context context)
throws IOException, InterruptedException {
for ( Text date : products.keySet()){
context.write(date, new Text(products.get(date).toString()));
}

}
}

输入:
1/2/09 6:17,product1,f3,f4,f5
1/2/09 6:17,product2,f3,f4,f5
1/2/09 6:17,product3,f3,f4,f5
1/2/09 6:17,product4,f3,f4,f5
1/2/09 6:17,product4,f3,f4,f5
1/2/10 6:17,product1,f3,f4,f5u
1/2/10 6:17,product2,f3,f4,f5u
1/2/10 6:17,product3,f3,f4,f5u
1/2/11 6:17,product2,f3,f4,f5u
1/2/12 6:17,product2,f3,f4,f5u
1/2/12 6:17,product3,f3,f4,f5u

输出:
1/2/09  [product1, product2, product3, product4]
1/2/10 [product1, product2, product3]
1/2/12 [product2, product3]
1/2/11 [product2]

MR作业的标准输出:
new date: 1/2/09, product1 added to list: [product1]
has date: 1/2/09 product2 not exist, added to list: [product1]
has date: 1/2/09, product2 added to list: [product1, product2]
has date: 1/2/09 product3 not exist, added to list: [product1, product2]
has date: 1/2/09, product3 added to list: [product1, product2, product3]
has date: 1/2/09 product4 not exist, added to list: [product1, product2, product3]
has date: 1/2/09, product4 added to list: [product1, product2, product3, product4]
has date: 1/2/09, product4 added to list: [product1, product2, product3, product4]
new date: 1/2/10, product1 added to list: [product1]
has date: 1/2/10 product2 not exist, added to list: [product1]
has date: 1/2/10, product2 added to list: [product1, product2]
has date: 1/2/10 product3 not exist, added to list: [product1, product2]
has date: 1/2/10, product3 added to list: [product1, product2, product3]
new date: 1/2/11, product2 added to list: [product2]
new date: 1/2/12, product2 added to list: [product2]
has date: 1/2/12 product3 not exist, added to list: [product2]
has date: 1/2/12, product3 added to list: [product2, product3]

关于hadoop - 映射器输出为逗号分隔值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38436944/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com