gpt4 book ai didi

hadoop - 如何在Hadoop中制作10G随机文本?

转载 作者:行者123 更新时间:2023-12-02 20:17:50 25 4
gpt4 key购买 nike

我想使用hadoop使10G大小的随机文本。
但是,当我使用下面的命令时,结果只有1.0G大小的随机文本。

hadoop jar hadoop-mapreduce-examples-*.jar randomwriter /user/root/random7

据我所知,randomwriter示例将10G随机文本作为默认值。
我怎么解决这个问题?
以下结果是命令的输出。
20/06/11 17:15:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
20/06/11 17:15:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
Running 10 maps.
Job started: Thu Jun 11 17:15:45 KST 2020
20/06/11 17:15:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
20/06/11 17:15:45 INFO mapreduce.JobSubmitter: number of splits:1
20/06/11 17:15:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local203266267_0001
20/06/11 17:15:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
20/06/11 17:15:46 INFO mapreduce.Job: Running job: job_local203266267_0001
20/06/11 17:15:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null
20/06/11 17:15:46 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/06/11 17:15:46 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/06/11 17:15:46 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/06/11 17:15:46 INFO mapred.LocalJobRunner: Waiting for map tasks
20/06/11 17:15:46 INFO mapred.LocalJobRunner: Starting task: attempt_local203266267_0001_m_000000_0
20/06/11 17:15:46 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/06/11 17:15:46 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
20/06/11 17:15:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
20/06/11 17:15:46 INFO mapred.MapTask: Processing split: hdfs://master:9000/user/root/random7/dummy-split-0:0+1
20/06/11 17:15:47 INFO mapreduce.Job: Job job_local203266267_0001 running in uber mode : false
20/06/11 17:15:47 INFO mapreduce.Job: map 0% reduce 0%
20/06/11 17:15:58 INFO mapred.LocalJobRunner: wrote record 94600. 78409377 bytes left. > map
20/06/11 17:15:59 INFO mapred.LocalJobRunner: wrote record 94600. 78409377 bytes left. > map
20/06/11 17:15:59 INFO mapred.Task: Task:attempt_local203266267_0001_m_000000_0 is done. And is in the process of committing
20/06/11 17:15:59 INFO mapred.LocalJobRunner: wrote record 94600. 78409377 bytes left. > map
20/06/11 17:15:59 INFO mapred.Task: Task attempt_local203266267_0001_m_000000_0 is allowed to commit now
20/06/11 17:15:59 INFO output.FileOutputCommitter: Saved output of task 'attempt_local203266267_0001_m_000000_0' to hdfs://master:9000/user/root/random7/_temporary/0/task_local203266267_0001_m_000000
20/06/11 17:15:59 INFO mapred.LocalJobRunner: done with 102093 records.
20/06/11 17:15:59 INFO mapred.Task: Task 'attempt_local203266267_0001_m_000000_0' done.
20/06/11 17:15:59 INFO mapred.LocalJobRunner: Finishing task: attempt_local203266267_0001_m_000000_0
20/06/11 17:15:59 INFO mapred.LocalJobRunner: map task executor complete.
20/06/11 17:16:00 INFO mapreduce.Job: map 100% reduce 0%
20/06/11 17:16:00 INFO mapreduce.Job: Job job_local203266267_0001 completed successfully
20/06/11 17:16:00 INFO mapreduce.Job: Counters: 22
File System Counters
FILE: Number of bytes read=303475
FILE: Number of bytes written=765149
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=1077285240
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=1
Map output records=102093
Input split bytes=115
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=28
Total committed heap usage (bytes)=219676672
org.apache.hadoop.examples.RandomWriter$Counters
BYTES_WRITTEN=1073754436
RECORDS_WRITTEN=102093
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1077285240
Job ended: Thu Jun 11 17:16:00 KST 2020
The job took 14 seconds.

最佳答案

此命令记录在源代码中
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/RandomWriter.java

您可以在mapreduce.randomwriter.totalbytes的xml配置中指定HADOOP_CONF

但是您更有可能希望拥有自己的主要方法来执行此任务。

关于hadoop - 如何在Hadoop中制作10G随机文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62320062/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com