gpt4 book ai didi

java - Hadoop : java. lang.ClassCastException : org. apache.hadoop.io.LongWritable 无法转换为 org.apache.hadoop.io.Text

转载 作者:IT老高 更新时间:2023-10-28 20:31:51 30 4
gpt4 key购买 nike

我的程序看起来像

public class TopKRecord extends Configured implements Tool {

public static class MapClass extends Mapper<Text, Text, Text, Text> {

public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];

if (claims.length() > 0 && (!claims.startsWith("\""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}
public int run(String args[]) throws Exception {
Job job = new Job();
job.setJarByClass(TopKRecord.class);

job.setMapperClass(MapClass.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setJobName("TopKRecord");
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}

public static void main(String args[]) throws Exception {
int ret = ToolRunner.run(new TopKRecord(), args);
System.exit(ret);
}
}

数据看起来像

"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD"
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,

在运行这个程序时,我在控制台上看到以下内容

12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

我相信类类型映射正确, Class Mapper ,

请让我知道我在这里做错了什么?

最佳答案

当您使用 M/R 程序读取文件时,映射器的输入键应该是文件中行的索引,而输入值将是整行。

所以这里发生的事情是你试图将行索引作为一个错误的 Text 对象,你需要一个 LongWritable 来代替,以便 Hadoop 不会不要提示类型。

试试这个:

public class TopKRecord extends Configured implements Tool {

public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];

if (claims.length() > 0 && (!claims.startsWith("\""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}

...
}

您可能还需要重新考虑代码中的一件事,即您正在为正在处理的每条记录创建 2 个 Text 对象。您应该只在开始时创建这两个对象,然后在您的映射器中使用 set 方法设置它们的值。如果您要处理大量数据,这将为您节省大量时间。

关于java - Hadoop : java. lang.ClassCastException : org. apache.hadoop.io.LongWritable 无法转换为 org.apache.hadoop.io.Text,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11784729/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com