gpt4 book ai didi

具有复合键的 Hadoop 困难

转载 作者:可可西里 更新时间:2023-11-01 15:39:42 28 4
gpt4 key购买 nike

我正在使用 Hadoop 分析 GSOD 数据 (ftp://ftp.ncdc.noaa.gov/pub/data/gsod/)。我选择了 5 年来执行我的实验 (2005 - 2009)。我配置了一个小集群并执行了一个简单的 MapReduce 程序,该程序获取了一年的最高温度记录。

现在我必须创建一个新的 MR 程序,为每个站点统计这些年来发生的所有现象。

我必须分析的文件具有以下结构:

STN--- ...  FRSHTO
722115 110001
722115 011001
722110 111000
722110 001000
722000 001000

STN 列表示站点代码,FRSHTT 表示现象:F - 雾,R - 雨或毛毛雨,S - 雪或冰粒,H - 冰雹,T - 雷声,O - Tornado 或漏斗云。

值为1,表示该现象发生在当天; 0,表示没有发生。

我需要找到如下结果:

722115: F = 1, R = 2, S = 1, O = 2
722110: F = 1, R = 1, S = 2
722000: S = 1

我可以运行 MR 程序,但结果是错误的,给我这些结果:

722115 F, 1
722115 R, 1
722115 R, 1
722115 S, 1
722115 O, 1
722115 O, 1
722110 F, 1
722110 R, 1
722110 S, 1
722110 S, 1
722000 S, 1

我用过这些代码:

映射器.java

public class Mapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, StationPhenomenun, IntWritable> {
@Override
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException {
String line = value.toString();
// Every file starts with a field description line, so, I ignore this line
if (!line.startsWith("STN---")) {
// First field of the line means the station code where data was collected
String station = line.substring(0, 6);
String fog = (line.substring(132, 133));
String rainOrDrizzle = (line.substring(133, 134));
String snowOrIcePellets = (line.substring(134, 135));
String hail = (line.substring(135, 136));
String thunder = (line.substring(136, 137));
String tornadoOrFunnelCloud = (line.substring(137, 138));

if (fog.equals("1"))
context.write(new StationPhenomenun(station,"F"), new IntWritable(1));
if (rainOrDrizzle.equals("1"))
context.write(new StationPhenomenun(station,"R"), new IntWritable(1));
if (snowOrIcePellets.equals("1"))
context.write(new StationPhenomenun(station,"S"), new IntWritable(1));
if (hail.equals("1"))
context.write(new StationPhenomenun(station,"H"), new IntWritable(1));
if (thunder.equals("1"))
context.write(new StationPhenomenun(station,"T"), new IntWritable(1));
if (tornadoOrFunnelCloud.equals("1"))
context.write(new StationPhenomenun(station,"O"), new IntWritable(1));
}
}
}

Reducer.java

public class Reducer extends org.apache.hadoop.mapreduce.Reducer<StationPhenomenun, IntWritable, StationPhenomenun, IntWritable> {

protected void reduce(StationPhenomenun key, Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
count++;
}

String station = key.getStation().toString();
String occurence = key.getPhenomenun().toString();

StationPhenomenun textPair = new StationPhenomenun(station, occurence);
context.write(textPair, new IntWritable(count));
}
}

StationPhenomenum.java

public class StationPhenomenun implements WritableComparable<StationPhenomenun> {
private String station;
private String phenomenun;
public StationPhenomenun(String station, String phenomenun) {
this.station = station;
this.phenomenun = phenomenun;
}
public StationPhenomenun() {
}
public String getStation() {
return station;
}
public String getPhenomenun() {
return phenomenun;
}
@Override
public void readFields(DataInput in) throws IOException {
station = in.readUTF();
phenomenun = in.readUTF();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(station);
out.writeUTF(phenomenun);
}
@Override
public int compareTo(StationPhenomenun t) {
int cmp = this.station.compareTo(t.station);
if (cmp != 0) {
return cmp;
}
return this.phenomenun.compareTo(t.phenomenun);
}
@Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final StationPhenomenun other = (StationPhenomenun) obj;
if (this.station != other.station && (this.station == null || !this.station.equals(other.station))) {
return false;
}
if (this.phenomenun != other.phenomenun && (this.phenomenun == null || !this.phenomenun.equals(other.phenomenun))) {
return false;
}
return true;
}
@Override
public int hashCode() {
return this.station.hashCode() * 163 + this.phenomenun.hashCode();
}
}

NcdcJob.java

public class NcdcJob {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(NcdcJob.class);
FileInputFormat.addInputPath(job, new Path("/user/hadoop/input"));
FileOutputFormat.setOutputPath(job, new Path("/user/hadoop/station"));
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setMapOutputKeyClass(StationPhenomenun.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(StationPhenomenun.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

有人做过类似的事情吗?

PS.: 我试过这个解决方案 ( Hadoop - composite key ) 但对我不起作用。

最佳答案

只需检查以下 2 个类是否与您的自定义实现相匹配。

 job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

我能够通过以下更改获得所需的结果

protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

protected void reduce(StationPhenomenun key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

还将类名更改为 MyMapperMyReducer

722115,1,1,0,0,0,1
722115,0,1,1,0,0,1
722110,1,1,1,0,0,0
722110,0,0,1,0,0,0
722000,0,0,1,0,0,0

对于这个输入集,我可以得到以下结果

StationPhenomenun [station=722000, phenomenun=S]    1
StationPhenomenun [station=722110, phenomenun=F] 1
StationPhenomenun [station=722110, phenomenun=R] 1
StationPhenomenun [station=722110, phenomenun=S] 2
StationPhenomenun [station=722115, phenomenun=F] 1
StationPhenomenun [station=722115, phenomenun=O] 2
StationPhenomenun [station=722115, phenomenun=R] 2
StationPhenomenun [station=722115, phenomenun=S] 1

计算是一样的,你只需要自定义输出的显示方式。

关于具有复合键的 Hadoop 困难,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18381684/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com