gpt4 book ai didi

java - Java-Hadoop Map Reduce,错误输入,省略csv header

转载 作者:行者123 更新时间:2023-12-02 22:03:05 24 4
gpt4 key购买 nike

我是Hadoop和MapReduce的新手,并且通过实现此程序,现在它发生了一系列我不了解的错误。

该程序使用具有以下结构的数据集: Barrios.csv

"Codigo de barrio";"Codigo de distrito al que pertenece";"Nombre de barrio";"Nombre acentuado del barrio";"Superficie (m2)";"Perimetro (m)"
"01";"01";"PALACIO ";"PALACIO ";"001471085";"005754"
"01";"02";"IMPERIAL ";"IMPERIAL ";"000967500";"004557"
"01";"03";"PACIFICO ";"PACÍFICO ";"000750065";"004005"
"01";"04";"RECOLETOS ";"RECOLETOS ";"000870857";"003927"
"01";"05";"EL VISO ";"EL VISO ";"001708046";"005269"
"01";"06";"BELLAS VISTAS ";"BELLAS VISTAS ";"000716261";"003443"
"01";"07";"GAZTAMBIDE ";"GAZTAMBIDE ";"000506596";"002969"
"01";"08";"EL PARDO ";"EL PARDO ";"187642916";"087125"
"01";"09";"CASA DE CAMPO ";"CASA DE CAMPO ";"017470075";"019233"
"01";"10";"LOS CARMENES ";"LOS CÁRMENES ";"001292235";"006186"
"01";"11";"COMILLAS ";"COMILLAS ";"000665999";"004257"
"01";"12";"ORCASITAS ";"ORCASITAS ";"001356371";"004664"
"01";"13";"ENTREVIAS ";"ENTREVÍAS ";"005996932";"011057"
"01";"14";"PAVONES ";"PAVONES ";"001016979";"004134"
"01";"15";"VENTAS ";"VENTAS ";"003198045";"008207"
"01";"16";"PALOMAS ";"PALOMAS ";"001128602";"004988"
"01";"17";"SAN ANDRES ";"SAN ANDRÉS ";"009192451";"013710"
"01";"18";"CASCO H.VALLECAS ";"CASCO H.VALLECAS ";"049359337";"031924"
"01";"19";"CASCO H.VICALVARO ";"CASCO H.VICÁLVARO ";"032924620";"033326"
"01";"20";"SIMANCAS ";"SIMANCAS ";"002278418";"006678"
"01";"21";"ALAMEDA DE OSUNA ";"ALAMEDA DE OSUNA ";"001961904";"006043"

这代表了马德里的不同地区,并显示了其中的一系列数据,例如周长,总地面...等

在我的MapReduce程序中,我想获得按“Codigo de barrio”分组的所有地区的质子周长,例如,从“Codigo de barrio”等于1,然后等于2的所有地区中获得质子周长。 。etc(温度计是最后一列的值。

这是我的代码:
public class WordCount {

private static final String SEPARATOR = ";";

public static class BarrioMapper extends Mapper<Object, Text, IntWritable, IntWritable>{

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
final String[] values = value.toString().split(SEPARATOR);

final int grupoBarrio = Integer.parseInt(values[0]);
final int perimetro = Integer.parseInt(values[5]);

context.write(new IntWritable(grupoBarrio), new IntWritable(perimetro));
}
}

public static class BarrioReducer extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
int contador = 0;

for (IntWritable value : values) {
sum += value.get();
contador++;
}

if (contador > 0) {
result.set(sum/contador);
context.write(key, result);
}
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(BarrioMapper.class);
job.setCombinerClass(BarrioReducer.class);
job.setReducerClass(BarrioReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

我将其视为IntWritable,我的问题是当我使用以下命令传递数据和目录以在hadoop上运行它时:

纱 jar WordCount.jar uam.WordCount Barrios.csv outPutDir

我遇到此错误:
`INFO mapreduce.Job: Task Id : attempt_1487862618135_1006_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "Codigo de barrio"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at uam.WordCount$BarrioMapper.map(WordCount.java:20)
at uam.WordCount$BarrioMapper.map(WordCount.java:15)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

该错误与“Codigo de barrio”输入数据有关,我不明白这是什么意思。
`

最佳答案

您的分割错误:

final int grupoBarrio = Integer.parseInt(values[0]);
第一行的values [0]是“Codigo de barrio”,您应该在csv文件中提交 header (第一行)-它不是数字值

关于java - Java-Hadoop Map Reduce,错误输入,省略csv header ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47555393/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com