gpt4 book ai didi

java - Mallet:OutOfMemoryError:Java 堆空间

转载 作者:行者123 更新时间:2023-11-30 10:28:21 26 4
gpt4 key购买 nike

在 Mallet 中训练数据时,处理因 OutOfMemoryError 而停止。 bin/mallet 中的属性 MEMORY 已经设置为 3GB。训练文件 output.mallet 的大小仅为 31 MB。我试图减少训练数据的大小。但它仍然抛出相同的错误:

a161115@a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999

-------------------- Trial 0 --------------------

Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)

我会感谢对此问题的任何帮助或见解

编辑:这是我的 bin/mallet 文件。

#!/bin/bash


malletdir=`dirname $0`
malletdir=`dirname $malletdir`

cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp

MEMORY=10g

CMD=$1
shift

help()
{
cat <<EOF
Mallet 2.0 commands:

import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load SVMLight format data files into Mallet instances
info get information about Mallet instances
train-classifier train a classifier from Mallet data files
classify-dir classify data from a single file with a saved classifier
classify-file classify the contents of a directory with a saved classifier
classify-svmlight classify data from a single file in SVMLight format
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
evaluate-topics estimate the probability of new documents under a trained model
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
bulk-load for big input files, efficiently prune vocabulary and import docs

Include --help with any option for more information
EOF
}

CLASS=

case $CMD in
import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
bulk-load) CLASS=cc.mallet.util.BulkLoader;;
run) CLASS=$1; shift;;
*) echo "Unrecognized command: $CMD"; help; exit 1;;
esac

java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$@"

还值得一提的是,我的原始训练文件有 60,000 个项目。当我减少项目数量(20,000 个实例)时,训练将正常运行,但使用大约 10GB RAM。

最佳答案

检查bin/mallet中对Java的调用并添加标志-Xmx3g,确保其中没有另一个Xmx;如果是这样,请编辑那个)。

关于java - Mallet:OutOfMemoryError:Java 堆空间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44689581/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com