gpt4 book ai didi

java - Mahout 随机森林分类器示例 ArrayIndexOutOfBoundsException

转载 作者:行者123 更新时间:2023-12-01 13:13:55 26 4
gpt4 key购买 nike

尝试运行 random forest example 时我遇到 java.lang.ArrayIndexOutOfBoundsException: 100 错误。这里 100 与树的数量绑定(bind)。 Map部分完成100%,reduce部分完成0%。我使用 hadoop-1.2.1mahout-distribution-0.7。我也尝试过 mahout-distribution-0.9 ,但出现同样的错误。

有人幸运地运行了这个示例吗?

最佳答案

发现问题。如果使用mapred.job.tracker=local运行hadoop,则PartialBuilder无法使用mapred.map.tasks获取映射任务的数量。因此,它计算出的每个映射任务的树数是错误的。

解决方案:在本地 hadoop 上运行随机森林作业时不要使用参数“-p”。

详细信息:

windiana@host:~/mahout/data/> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -t 100 -o nsl-forest
Warning: $HADOOP_HOME is deprecated.

14/08/07 11:25:18 INFO mapreduce.BuildForest: InMem Mapred implementation
14/08/07 11:25:18 INFO mapreduce.BuildForest: Building the forest...
14/08/07 11:25:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO mapred.JobClient: Running job: job_local966281240_0001
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Waiting for map tasks
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Starting task: attempt_local966281240_0001_m_000000_0
14/08/07 11:25:19 INFO util.ProcessTree: setsid exited with exit code 0
14/08/07 11:25:19 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2df8fdda
14/08/07 11:25:19 INFO mapred.MapTask: Processing split: [firstId:0, nbTrees:100, seed:null]
14/08/07 11:25:19 INFO inmem.InMemMapper: Loading the data...
14/08/07 11:25:20 INFO mapred.JobClient: map 0% reduce 0%
14/08/07 11:25:21 INFO inmem.InMemMapper: Data loaded : 125973 instances
14/08/07 11:25:25 INFO mapred.LocalJobRunner:
14/08/07 11:25:26 INFO mapred.JobClient: map 1% reduce 0%

...

14/08/07 11:27:59 INFO mapred.JobClient: map 98% reduce 0%
14/08/07 11:28:00 INFO mapred.Task: Task:attempt_local966281240_0001_m_000000_0 is done. And is in the process of commiting
14/08/07 11:28:00 INFO mapred.LocalJobRunner:
14/08/07 11:28:00 INFO mapred.Task: Task attempt_local966281240_0001_m_000000_0 is allowed to commit now
14/08/07 11:28:00 INFO output.FileOutputCommitter: Saved output of task 'attempt_local966281240_0001_m_000000_0' to file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapred.LocalJobRunner:
14/08/07 11:28:00 INFO mapred.Task: Task 'attempt_local966281240_0001_m_000000_0' done.
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local966281240_0001_m_000000_0
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Map task executor complete.
14/08/07 11:28:00 INFO mapred.JobClient: map 99% reduce 0%
14/08/07 11:28:00 INFO mapred.JobClient: Job complete: job_local966281240_0001
14/08/07 11:28:00 INFO mapred.JobClient: Counters: 12
14/08/07 11:28:00 INFO mapred.JobClient: File Output Format Counters
14/08/07 11:28:00 INFO mapred.JobClient: Bytes Written=2353226
14/08/07 11:28:00 INFO mapred.JobClient: File Input Format Counters
14/08/07 11:28:00 INFO mapred.JobClient: Bytes Read=0
14/08/07 11:28:00 INFO mapred.JobClient: FileSystemCounters
14/08/07 11:28:00 INFO mapred.JobClient: FILE_BYTES_READ=61962918
14/08/07 11:28:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45667235
14/08/07 11:28:00 INFO mapred.JobClient: Map-Reduce Framework
14/08/07 11:28:00 INFO mapred.JobClient: Map input records=100
14/08/07 11:28:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient: Spilled Records=0
14/08/07 11:28:00 INFO mapred.JobClient: Total committed heap usage (bytes)=132120576
14/08/07 11:28:00 INFO mapred.JobClient: CPU time spent (ms)=0
14/08/07 11:28:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=90
14/08/07 11:28:00 INFO mapred.JobClient: Map output records=100
14/08/07 11:28:00 INFO common.HadoopUtil: Deleting file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapreduce.BuildForest: Build Time: 0h 2m 41s 702
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest num Nodes: 130056
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean num Nodes: 1300
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean max Depth: 19
14/08/07 11:28:00 INFO mapreduce.BuildForest: Storing the forest in: nsl-forest/forest.seq

关于java - Mahout 随机森林分类器示例 ArrayIndexOutOfBoundsException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22611881/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com