gpt4 book ai didi

java - 在不创建 jar 文件的情况下运行 hadoop 作业

转载 作者:可可西里 更新时间:2023-11-01 15:55:57 25 4
gpt4 key购买 nike

我是 hadoop 的初学者,刚刚练习了一些教程项目。最初用 python 在 hadoop 中做项目,在那里我可以分别指定映射器和缩减器文件 hadoop jar/usr/local/hadoop/hadoop-2.8.0/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input input1 -output joboutput

但我想在 java 中做同样的事情,但我只能通过创建 jar 文件找到教程。我没有找到任何调试 java 映射器和缩减器代码的方法。是否有任何想法或可能性通过使用一些调试选项来测试我们的代码。

特此张贴我印象深刻的屏幕截图。

Sample input file in csv

映射器代码

    package SalesCountry;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
//private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

String valueString = value.toString();
String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), new IntWritable(Integer.parseInt(SingleCountryData[2])));
}
}

Reducer 代码

`package SalesCountry;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesCountryReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {
Text key = t_key;
int salesForCountry = 0;
while (values.hasNext()) {
// replace type of value with the actual type of our value
IntWritable value = (IntWritable) values.next();
salesForCountry += value.get();

}
output.collect(key, new IntWritable(salesForCountry));
}
}
`

终端输出

$HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales
17/05/18 12:52:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/18 12:52:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/05/18 12:52:47 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
17/05/18 12:52:47 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/05/18 12:52:47 INFO mapred.FileInputFormat: Total input files to process : 1
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: number of splits:1
17/05/18 12:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1862814770_0001
17/05/18 12:52:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/05/18 12:52:47 INFO mapreduce.Job: Running job: job_local1862814770_0001
17/05/18 12:52:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Waiting for map tasks
17/05/18 12:52:47 INFO mapred.LocalJobRunner: Starting task: attempt_local1862814770_0001_m_000000_0
17/05/18 12:52:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/05/18 12:52:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
17/05/18 12:52:47 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/05/18 12:52:47 INFO mapred.MapTask: Processing split: file:/home/deevita/MapReduceTutorial/inputMapReduce/SalesJan2009.csv:0+123638
17/05/18 12:52:47 INFO mapred.MapTask: numReduceTasks: 1
17/05/18 12:52:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/05/18 12:52:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/05/18 12:52:47 INFO mapred.MapTask: soft limit at 83886080
17/05/18 12:52:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/05/18 12:52:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/05/18 12:52:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/05/18 12:52:47 INFO mapred.LocalJobRunner: map task executor complete.
17/05/18 12:52:47 WARN mapred.LocalJobRunner: job_local1862814770_0001
java.lang.Exception: java.lang.NumberFormatException: For input string: "Price"
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.NumberFormatException: For input string: "Price"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at SalesCountry.SalesMapper.map(SalesMapper.java:17)
at SalesCountry.SalesMapper.map(SalesMapper.java:10)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 running in uber mode : false
17/05/18 12:52:48 INFO mapreduce.Job: map 0% reduce 0%
17/05/18 12:52:48 INFO mapreduce.Job: Job job_local1862814770_0001 failed with state FAILED due to: NA
17/05/18 12:52:48 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
deevita@deevita-OptiPlex-7040:~/MapReduceTutorial$ $HADOOP_HOME/bin/hadoop jar TotalSalePerCountry.jar inputMapReduce mapreduce_output_sales
17/05/18 16:15:12 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/18 16:15:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/05/18 16:15:12 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/deevita/MapReduceTutorial/mapreduce_output_sales already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:141)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
at SalesCountry.SalesCountryDriver.main(SalesCountryDriver.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

我知道这是数字格式异常,但要编译它我每次都必须构建 jar 文件那么有没有办法每次都单独执行 mapreduce 而无需构建 jar

最佳答案

NumberFormatException 可能来自空白(需要先修剪)。

我建议您为您的作业编写单元测试,这样您就可以在不执行整个 jar/部署周期的情况下调试它们。

这是一个使用 mrunit 的例子。

<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>1.0.0</version>
<classifier>hadoop1</classifier>
<scope>test</scope>
</dependency>

测试

public class HadoopTest {
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;

@Before
public void setUp() {
SalesMapper mapper = new SalesMapper();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
}

@Test
public void testMapper() throws Exception {
mapDriver.withInput(new LongWritable(1), new Text("date,product,1200,Visa,carolina,baslidoni,england,UK"));
mapDriver.withOutput(new Text("UK"), new IntWritable(1200));
mapDriver.runTest();
}
}

关于java - 在不创建 jar 文件的情况下运行 hadoop 作业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44045856/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com