作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在 Hadoop 1.1.1 (Ubuntu) 上用 Java 编写一个应用程序,它比较字符串以找到最长的公共(public)子字符串。我已经为小型数据集成功运行了 map 和 reduce 阶段。每当我增加输入的大小时,我的减少输出永远不会出现在我的目标输出目录中。它一点也不提示,这让这一切变得更加奇怪。我在 Eclipse 中运行所有东西,我有 1 个映射器和 1 个 reducer 。
我的 reducer 在字符串集合中找到最长的公共(public)子字符串,然后将子字符串作为键发出,并将包含它的字符串的索引作为值发出。我有一个简短的例子。
输入数据
0: ALPHAA
1: ALPHAB
2: ALZHA
Key: ALPHA Value: 0
Key: ALPHA Value: 1
Key: AL Value: 0
Key: AL Value: 1
Key: AL Value: 2
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Total input paths to process : 1
Running job: job_local_0001
setsid exited with exit code 0
Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@411fd5a3
Snappy native library not loaded
io.sort.mb = 100
data buffer = 79691776/99614720
record buffer = 262144/327680
map 0% reduce 0%
Spilling map output: record full = true
bufstart = 0; bufend = 22852573; bufvoid = 99614720
kvstart = 0; kvend = 262144; length = 327680
Finished spill 0
Starting flush of map output
Finished spill 1
Merging 2 sorted segments
Down to the last merge-pass, with 2 segments left of total size: 28981648 bytes
Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
Task attempt_local_0001_m_000000_0 done.
Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3aff2f16
Merging 1 sorted segments
Down to the last merge-pass, with 1 segments left of total size: 28981646 bytes
map 100% reduce 0%
reduce > reduce
map 100% reduce 66%
reduce > reduce
map 100% reduce 67%
reduce > reduce
reduce > reduce
map 100% reduce 68%
reduce > reduce
reduce > reduce
reduce > reduce
map 100% reduce 69%
reduce > reduce
reduce > reduce
map 100% reduce 70%
reduce > reduce
job_local_0001
Job complete: job_local_0001
Counters: 22
File Output Format Counters
Bytes Written=14754916
FileSystemCounters
FILE_BYTES_READ=61475617
HDFS_BYTES_READ=97361881
FILE_BYTES_WRITTEN=116018418
HDFS_BYTES_WRITTEN=116746326
File Input Format Counters
Bytes Read=46366176
Map-Reduce Framework
Reduce input groups=27774
Map output materialized bytes=28981650
Combine output records=0
Map input records=4629524
Reduce shuffle bytes=0
Physical memory (bytes) snapshot=0
Reduce output records=832559
Spilled Records=651304
Map output bytes=28289481
CPU time spent (ms)=0
Total committed heap usage (bytes)=2578972672
Virtual memory (bytes) snapshot=0
Combine input records=0
Map output records=325652
SPLIT_RAW_BYTES=136
Reduce input records=27774
reduce > reduce
reduce > reduce
最佳答案
我将reduce() 和map() 逻辑放在try-catch block 中,catch block 增加一个计数器,其组为“异常”,其名称为异常消息。这给了我一个快速的方法(通过查看计数器列表)来查看抛出了哪些异常(如果有的话)。
关于java - 从未为大数据创建的 Hadoop 减少输出文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16551710/
背景 我最近在 merge 期间遇到了一个意外未 merge 的文档文件的问题。 无论出于何种原因,我搞砸了 merge 并有效地删除了文件(和其他几个文件),因为我忘记了它们的存在。 现在我想查看我
我在我的网站上使用旧的 mysql 版本和 php 版本 4。 我的表结构: | orders_status_history_id | orders_id | orders_status_id |
我是一名优秀的程序员,十分优秀!