gpt4 book ai didi

apache-spark - Spark java.lang.OutOfMemoryError : Java Heap space

转载 作者:行者123 更新时间:2023-12-02 20:22:12 62 4
gpt4 key购买 nike

<分区>

当我使用 spark 运行模型训练管道时出现上述错误

`val inputData = spark.read
.option("header", true)
.option("mode","DROPMALFORMED")
.csv(input)
.repartition(500)
.toDF("b", "c")
.withColumn("b", lower(col("b")))
.withColumn("c", lower(col("c")))
.toDF("b", "c")
.na.drop()`

inputData 有大约 2500 万行,大小约为 2GB。模型构建阶段是这样发生的

val tokenizer = new Tokenizer()
.setInputCol("c")
.setOutputCol("tokens")

val cvSpec = new CountVectorizer()
.setInputCol("tokens")
.setOutputCol("features")
.setMinDF(minDF)
.setVocabSize(vocabSize)

val nb = new NaiveBayes()
.setLabelCol("bi")
.setFeaturesCol("features")
.setPredictionCol("prediction")
.setSmoothing(smoothing)

new Pipeline().setStages(Array(tokenizer, cvSpec, nb)).fit(inputData)

我正在使用以下命令在具有 16gb RAM 的机器上本地运行上述 spark 作业

spark-submit --class holmes.model.building.ModelBuilder ./holmes-model-building/target/scala-2.11/holmes-model-building_2.11-1.0.0-SNAPSHOT-7d6978.jar --master local[*] --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryoserializer.buffer.max=2000m --conf spark.driver.maxResultSize=2g --conf Spark 。 rpc.message.maxSize=1024 --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=50g --driver-memory=12g

oom 错误由(在堆栈跟踪的底部)触发 通过 org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:706)

日志:

Caused by: java.lang.OutOfMemoryError: Java heap space at java.lang.reflect.Array.newInstance(Array.java:75) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1897) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:706) 

任何建议都会很棒:)

62 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com