gpt4 book ai didi

java - 缺少 Map/Combine/Reduce 的逻辑,关于如何跟踪某些东西

转载 作者:可可西里 更新时间:2023-11-01 16:07:45 24 4
gpt4 key购买 nike

我正在尝试使用 map/reducer 来处理与我之前使用的不同的 staff。

我现在有一个这样的输入文件:

1 50000 2015 pc technology 
2 15424 1998 mouse technology
3 78420 2010 pen technology
4 8452 2000 pen stationery
5 4125 2000 pen stationery

id、价格、年份、项目、类型

我正在尝试做的是计算特定类型的特定商品的平均价格、每种类型以及该特定商品售出的每一年的平均价格。所以,举个例子,我开始为钢笔做这些东西。 2000 年钢笔的平均价格是多少?在我的示例中,有两种笔(用于 PC 的数字笔和标准笔),所以我喜欢这样的输出:

pen stationery 6288 2000
pen technology 78420 2010

我遇到的问题是我不知道该怎么做...我知道如何使用组合器/ reducer 计算平均值,但我不知道如何跟踪年份和项目...我的逻辑是这样的:检查元素是否是一支钢笔,然后跟踪那支钢笔的年份和价格,并用这些数据计算平均值。但我不知道该怎么做。非常感谢任何帮助,谢谢

import java.io.IOException;
import java.util.StringTokenizer;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Draft2 {

public static class TokenizerMapper extends Mapper<Object, Text, Text, Text>{

private Text word = new Text();
private Text word2 = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

String[] tokens = value.toString().split(",");

String price = "";
String item = "";
String typ = "";
String year = "";

if(tokens.length >= 2)
price = tokens[1];

if(tokens.length >= 3)
year = tokens[2];

if (tokens.length >=5)
typ = tokens[4];

if (tokens.length >=14)
item = tokens[13];


if(!item.isEmpty())
{
word.set(item + "|" + year);
word2.set(price + "|" + typ);
context.write(word, word2);
}
}
}

public static class MeanCombiner extends Reducer <Text,Text,Text,Text> {
private Text word = new Text();
private Text word2 = new Text();

public void combine(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
int sum = 0;
int counter = 0;

final Iterator<Text> itr = values.iterator();
String[] strs = key.toString().split("|");
final String[] tokens = values.toString().split("|");

String item = strs[0];
String typ = tokens[1];
String type = "pen";
String year = strs[1];
Boolean found = false;


for (Text val: values) {
String[] strs = key.toString().split("|");
final String[] tokens = values.toString().split("|");
String item= strs[0];
String typ = tokens[1];
String type = "pen";
String year = strs[1];

if (typ.equals(type)) {
found = true;
//don't know how to go on

}

/*this part is for the average but with wrong data*/
while (itr.hasNext()) {
if (typ.equals(type)) {
final String price = tokens[0];
final int value = Integer.parseInt(price);
counter++;
sum += value;
}
}

final int average = sum/counter;

String avg = Integer.toString(average);
String count = Integer.toString(counter);
word.set(key.toString());
word2.set(avg + "|" + count);
context.write(word, word2);
}
}

/*The reducer is incomplete*/
public static class MeanReducer extends Reducer<Text,Text,Text,Text> {

private Text word = new Text();
private Text word2 = new Text();

public void reduce(Text key, Iterable<Text> values, Context context
) throws IOException, InterruptedException {


word.set(k);
word2.set();
context.write(word, word2);

}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Draft2");
job.setJarByClass(Draft2.class);
job.setCombinerClass(MeanCombiner.class);
job.setMapperClass(TokenizerMapper.class);
job.setReducerClass(MeanReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

最佳答案

计算平均值时不应使用组合器。

假设以下是您的输入,我将解释解决方案:

1 50000 2015 pc technology 
2 15424 1998 mouse technology
3 78420 2010 pen technology
4 8452 2000 pen stationery
5 4125 2000 pen stationery

映射器:

  1. 通过拆分空白(“”)来解析每条记录
  2. 对于每条记录,发出(键,值)作为(年|项目|类型,价格)。组合 (year|item|type) 将为每条记录提供一个唯一的键。例如,对于记录“5 4125 2000 pen stationery”,您将发出键为:“2000|pen| stationery,值为:4125。

因此映射器的输出将是:

2015|pc|technology    50000 
1998|mouse|technology 15424
2010|pen|technology 78420
2000|pen|stationery 8452
2000|pen|stationery 4125

reducer :

  1. 对于每个键,计算总和,然后计算平均值。
  2. 给出平均值以及其他详细信息

例如以下键将进入同一个 reducer :

2000|pen|stationery   8452 
2000|pen|stationery 4125

输出将是:

pen stationery 6288 2000

关于java - 缺少 Map/Combine/Reduce 的逻辑,关于如何跟踪某些东西,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34342898/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com