- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
在更改 mapred-site.xml 中的属性后,我给出了一个 tar.bz2 文件、.gz 和 tar.gz 文件作为输入。以上似乎都没有奏效。我假设这里发生的是 hadoop 作为输入读取的记录乱序,即。输入的一列是字符串,另一列是整数,但是由于一些乱序数据从压缩文件中读取它时,在某些时候hadoop将字符串部分读取为整数并生成非法格式异常。我只是个菜鸟。我想知道是配置有问题还是我的代码有问题。
core-site.xml中的属性是
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apac\
he.hadoop.io.compress.SnappyCodec</value>
<description>A list of the compression codec classes that can be used for compression/decompression.</description>
</property>
mapred-site.xml 中的属性是
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
这是我的代码
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.util.NativeCodeLoader;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Decompressor;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.io.compress.GzipCodec;
import org.apache.hadoop.io.compress.*;
import org.apache.hadoop.io.compress.BZip2Codec;
public class MySort{
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable Marks = new IntWritable();
private Text name = new Text();
String one,two;
int num;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
one=tokenizer.nextToken();
name.set(one);
if(tokenizer.hasMoreTokens())
two=tokenizer.nextToken();
num=Integer.parseInt(two);
Marks.set(num);
context.write(name, Marks);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// conf.set("mapreduce.job.inputformat.class", "com.wizecommerce.utils.mapred.TextInputFormat");
// conf.set("mapreduce.job.outputformat.class", "com.wizecommerce.utils.mapred.TextOutputFormat");
// conf.setBoolean("mapreduce.map.output.compress",true);
conf.setBoolean("mapred.output.compress",true);
//conf.setBoolean("mapreduce.output.fileoutputformat.compress",false);
//conf.setBoolean("mapreduce.map.output.compress",true);
conf.set("mapred.output.compression.type", "BLOCK");
//conf.setClass("mapreduce.map.output.compress.codec", BZip2Codec.class, CompressionCodec.class);
// conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
conf.setClass("mapred.map.output.compression.codec", BZip2Codec.class, CompressionCodec.class);
Job job = new Job(conf, "mysort");
job.setJarByClass(org.myorg.MySort.class);
job.setJobName("mysort");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// FileInputFormat.setCompressInput(job,true);
FileOutputFormat.setCompressOutput(job, true);
//FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
// conf.set("mapred.output.compression.type", CompressionType.BLOCK.toString());
FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
这些都是放在一个makefile中的所有命令
run: all
-sudo ./a.out
sudo chmod 777 -R Data
-sudo rm data.tar.bz2
sudo tar -cvjf data.tar.bz2 Data/data.txt
sudo javac -classpath /home/hduser/12115_Select_Query/hadoop-core-1.1.2.jar -d mysort MySort.java
sudo jar -cvf mysort.jar -C mysort/ .
-hadoop fs -rmr MySort/output
-hadoop fs -rmr MySort/input
hadoop fs -mkdir MySort/input
hadoop fs -put data.tar.bz2 MySort/input
hadoop jar mysort.jar org.myorg.MySort MySort/input/ MySort/output
-sudo rm /home/hduser/Out/sort.txt
hadoop fs -copyToLocal MySort/output/part-r-00000 /home/hduser/Out/sort.txt
sudo gedit /home/hduser/Out/sort.txt
all: rdata.c
-sudo rm a.out
-gcc rdata.c -o a.out
exec: run
.PHONY: exec run
命令:
hadoop jar mysort.jar org.myorg.MySort MySort/input/ MySort/output
这是输出:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/25 11:20:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/25 11:20:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/25 11:20:29 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/06/25 11:20:29 INFO input.FileInputFormat: Total input paths to process : 1
14/06/25 11:20:29 INFO mapreduce.JobSubmitter: number of splits:1
14/06/25 11:20:29 INFO Configuration.deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
14/06/25 11:20:29 INFO Configuration.deprecation: mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
14/06/25 11:20:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1403675322820_0001
14/06/25 11:20:30 INFO impl.YarnClientImpl: Submitted application application_1403675322820_0001
14/06/25 11:20:30 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1403675322820_0001/
14/06/25 11:20:30 INFO mapreduce.Job: Running job: job_1403675322820_0001
14/06/25 11:20:52 INFO mapreduce.Job: Job job_1403675322820_0001 running in uber mode : false
14/06/25 11:20:52 INFO mapreduce.Job: map 0% reduce 0%
14/06/25 11:21:10 INFO mapreduce.Job: Task Id : attempt_1403675322820_0001_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "0ustar"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.myorg.MySort$Map.map(MySort.java:36)
at org.myorg.MySort$Map.map(MySort.java:23)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
14/06/25 11:21:29 INFO mapreduce.Job: Task Id : attempt_1403675322820_0001_m_000000_1, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "0ustar"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.myorg.MySort$Map.map(MySort.java:36)
at org.myorg.MySort$Map.map(MySort.java:23)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
14/06/25 11:21:49 INFO mapreduce.Job: Task Id : attempt_1403675322820_0001_m_000000_2, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "0ustar"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.myorg.MySort$Map.map(MySort.java:36)
at org.myorg.MySort$Map.map(MySort.java:23)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
14/06/25 11:22:10 INFO mapreduce.Job: map 100% reduce 100%
14/06/25 11:22:10 INFO mapreduce.Job: Job job_1403675322820_0001 failed with state FAILED due to: Task failed task_1403675322820_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/06/25 11:22:10 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=69797
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=69797
Total vcore-seconds taken by all map tasks=69797
Total megabyte-seconds taken by all map tasks=71472128
我也试过用这个:
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.3.0.jar -Dmapred.output.compress=true -Dmapred.compress.map.output=true -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec -Dmapred.reduce.tasks=0 -input MySort/input/data.txt -output MySort/zip1
创建压缩文件成功
hadoop fs -ls MySort/zip1
Found 3 items
-rw-r--r-- 1 hduser supergroup 0 2014-06-25 10:43 MySort/zip1/_SUCCESS
-rw-r--r-- 1 hduser supergroup 42488018 2014-06-25 10:43 MySort/zip1/part-00000.bz2
-rw-r--r-- 1 hduser supergroup 42504084 2014-06-25 10:43 MySort/zip1/part-00001.bz2
然后运行这个:
hadoop jar mysort.jar org.myorg.MySort MySort/input/ MySort/zip1
还是不行。我在这里遗漏了什么吗?
当我运行它而不使用压缩文件 bz2 并直接将文本文件 Data/data.txt 传递给它时它工作正常,即将它上传到 hdfs 中的 MySort/input (hadoop fs -put Data/data.txt MySort/input) .
感谢任何帮助
最佳答案
我为此做了一些工作。我使用了 Tool Runner。
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.util.NativeCodeLoader;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Decompressor;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.io.compress.GzipCodec;
import org.apache.hadoop.io.compress.*;
import org.apache.hadoop.io.compress.BZip2Codec;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class ToolMapReduce extends Configured implements Tool
{
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable Marks = new IntWritable();
private Text name = new Text();
String one,two;
int num;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
one=tokenizer.nextToken();
name.set(one);
if(tokenizer.hasMoreTokens())
two=tokenizer.nextToken();
num=Integer.parseInt(two);
Marks.set(num);
context.write(name, Marks);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(), new ToolMapReduce(), args);
System.exit(res);
}
public int run(String[] args) throws Exception
{
Configuration conf = this.getConf();
//Configuration conf = new Configuration();
//conf.setOutputFormat(SequenceFileOutputFormat.class);
//SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);
//SequenceFileOutputFormat.setCompressOutput(conf, true);
//conf.set("mapred.output.compress","true");
// conf.set("mapred.output.compression","org.apache.hadoop.io.compress.SnappyCodec");
//conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");
// conf.set("mapreduce.job.inputformat.class", "com.wizecommerce.utils.mapred.TextInputFormat");
// conf.set("mapreduce.job.outputformat.class", "com.wizecommerce.utils.mapred.TextOutputFormat");
// conf.setBoolean("mapreduce.map.output.compress",true);
conf.setBoolean("mapred.output.compress",true);
//conf.setBoolean("mapreduce.output.fileoutputformat.compress",false);
//conf.setBoolean("mapreduce.map.output.compress",true);
conf.set("mapred.output.compression.type", "BLOCK");
//conf.setClass("mapreduce.map.output.compress.codec", BZip2Codec.class, CompressionCodec.class);
// conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
Job job = new Job(conf, "mysort");
job.setJarByClass(org.myorg.ToolMapReduce.class);
//job.setJarByClass(org.myorg.MySort.class);
job.setJobName("mysort");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// FileInputFormat.setCompressInput(job,true);
FileOutputFormat.setCompressOutput(job, true);
//FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
// conf.set("mapred.output.compression.type", CompressionType.BLOCK.toString());
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
//job.waitForCompletion(true);
}
}
关于hadoop - 以压缩文件作为输入运行 hadoop。 hadoop 读取的数据输入不按顺序。数字格式异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24401674/
我正在创建一个有效的突变,但我不确定它是否按照我认为的方式工作。但是,我想知道执行顺序是什么? 异步 从上到下同步 同步随机顺序 其他 我想确保在执行插入/更新插入之前从表中删除某些项目。使用以下突变
如何更改规则中的前提顺序? 例如,在伊莎贝尔的自然演绎规则中: mp: ?P ⟶ ?Q ⟹ ?P ⟹ ?Q 我们可以将顺序更改为: ?P ⟹ ?P ⟶ ?Q ⟹ ?Q 我可以用 rev_mp或者定义一
关闭。这个问题需要details or clarity .它目前不接受答案。 想改善这个问题吗?通过 editing this post 添加详细信息并澄清问题. 8年前关闭。 Improve thi
我正在使用 Hibernate 3.2,并使用标准来构建查询。我想为多对一关联添加和“排序”,但我不知道如何做到这一点。 Hibernate 查询最终看起来像这样,我猜: select t1.a, t
我正在开发一个项目,但无法让我的 javascript 按顺序工作。我知道 javascript 可以并行执行任务,因此当您向不响应的服务器发出请求时,它不会被卡住。这有它的优点和缺点。就我而言,这是
在下面的代码中,我认为f1 > f2 > f3是调用顺序,但是仅f1被调用。如何获得依次调用的3个函数? 我已经将以下内容添加到main函数中,它可以按预期工作,但是我想知道是否还有其他确定的方法可以
我有一个如下所示的对象数组: [{ "id": 1, "Size": 90, "Maturity": 24, }, { "id": 2, "S
这是征求意见和要求的请求。我是Docker的新手。 我想要一个用于Python项目的生产和开发容器(可能也进行单元测试)。我的搜索指向多阶段Dockerfile(以及运行它们的多个docker-com
我想知道解决以下问题的有效方法是什么: 假设我在组 1 中有三个字符,在组 2 中有两个字符: group_1 = c("X", "Y", "Z") group_2 = c("A", "B") 显然,
在 Cordova 网站上,您可以看到一长串按字母顺序排列的钩子(Hook)列表,但它们触发和执行的正确顺序是什么? 我正在尝试在构建/编译之前将 cordova.js 脚本添加到 index.htm
我想知道解决以下问题的有效方法是什么: 假设我在组 1 中有三个字符,在组 2 中有两个字符: group_1 = c("X", "Y", "Z") group_2 = c("A", "B") 显然,
这个问题已经有答案了: 奥 git _a (2 个回答) 已关闭 9 年前。 这是我的一个练习的代码, public class RockTest { public static void main(
我使用 HashMap 来存储一些数据,但每当新数据保存到 HashMap 或旧数据移出 HashMap 时,我都需要将其保持升序。但是hashmap本身不支持顺序,我可以使用什么数据结构来支持顺序?
我想创建一个序列,当星期几与函数参数中的日期相同时,它会返回所有年份的结果(例如:自开始日期起,2 月 12 日为星期日的所有年份)。 let myDate (dw:System.DayOfWeek)
我有一个包含许多元素的 Xelement。 我有以下代码来对它们进行排序: var calculation = from y in x.Elements("row")
假设我有: 在 javacript 文件中,我为类按钮和 ID 名称定义了点击操作,例如: $("#name").click(function(event){ alert("hi"); }) $
我有一个包含 2 个 subview 的 View - collectionView 和自定义 View 。我想设置一个操作在布置 2 个 View 后运行,但layoutSubViews 运行了两次
关闭。这个问题需要更多 focused .它目前不接受答案。 想改进这个问题?更新问题,使其仅关注一个问题 editing this post . 2年前关闭。 Improve this questi
我想知道 C++ 中是否有内置方法来比较两个双向迭代器的顺序。例如,我有一个 Sum 函数来计算同一列表中 2 个迭代器之间的总和: double Sum(std::list::const_itera
在 MySQL 中,这两个查询之间有区别吗? SELECT * FROM .... ORDER BY Created,Id DESC 和 SELECT * FROM .... ORDER BY Cre
我是一名优秀的程序员,十分优秀!