gpt4 book ai didi

hadoop - MapReduce - 从提供的路径读取文件

转载 作者:可可西里 更新时间:2023-11-01 15:28:49 26 4
gpt4 key购买 nike

我正在使用下面的代码来读取我的映射器中提供的文件路径。在一个类似的问题中提到了该代码。

    import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.MapReduceBase;

import java.util.StringTokenizer;

public class StubDriver {

// Main Method

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration(); // Configuration Object
Job job = new Job(conf, "My Program");
FileSystem fs = FileSystem.get(conf);
job.setJarByClass(StubDriver.class);
job.setMapperClass(Map1.class);
// job.setPartitionClass(Part1);
// job.setReducerClass(Reducer1);
// job.setNumReduceTasks(3);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

TextInputFormat.addInputPath(job,new Path(args[0]));;
TextOutputFormat.setOutputPath(job, new Path(args[1]));

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Text.class);

job.waitForCompletion(true);
}

// Mapper

public static class Map1 extends Mapper<LongWritable,Text,IntWritable,Text> {

public void setup(Context context) throws IOException {

Path pt = new Path("hdfs://quickstart.cloudera:8020/dhawalhdfs/input/*");
FileSystem fs = FileSystem.get(new Configuration());
BufferedReader br= new BufferedReader(new InputStreamReader(fs.open(pt)));
String line;
line = br.readLine();
while (line != null) {
System.out.println(line);
line = br.readLine();

}

}


public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

StringTokenizer tokenizer = new StringTokenizer(value.toString());

String a = tokenizer.nextToken();
String b = tokenizer.nextToken();
String c = tokenizer.nextToken();
String d = tokenizer.nextToken();
String e = tokenizer.nextToken();

context.write(new IntWritable(Integer.parseInt(c)),new Text(a + "\t" + b + "\t" + d + "\t" + e));

}
}
}

代码编译成功。我在提交作业时遇到错误。因为我在我的程序中提供了输入路径,所以我尝试仅提交输出路径,如下所示 -

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs/dhawal000

我收到错误信息

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at StubDriver.main(StubDriver.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

最佳答案

这是一个简单的错误...:-)

new Path(

args[1]

)); 是错误的来源。您尝试传递数组的一个参数并尝试读取第二个元素的地方

您正在访问 stub 驱动程序,如下所示

TextInputFormat.addInputPath(job,new Path(args[0]));;
TextOutputFormat.setOutputPath(job, new Path(args[1]));

但对于驱动程序,您只传递一个参数,如下所示

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs/dhawal000

理想情况下,您应该传递由空格分隔的参数

hadoop jar /home/cloudera/dhawal/MR/Par.jar StubDriver /dhawalhdfs   /dhawal000

关于hadoop - MapReduce - 从提供的路径读取文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38616820/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com