gpt4 book ai didi

scala - 将决策树训练分类器的模型输出保存为 Spark Scala 平台中的文本文件

转载 作者:行者123 更新时间:2023-12-05 07:52:49 25 4
gpt4 key购买 nike

我用来训练决策树的代码如下:

    import org.apache.spark.SparkContext 
import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.tree.configuration.Algo._
import org.apache.spark.mllib.tree.impurity.Gini
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.evaluation.MulticlassMetrics

//加载并解析数据文件

    val data = sc.textFile("data/mllib/spt.csv")
val parsedData = data.map { line =>
val parts = line.split(',').map(_.toDouble)
LabeledPoint(parts(0), Vectors.dense(parts.tail))
}

//拆分数据

    val splits = parsedData.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

//训练决策树模型。
//空的 categoricalFeaturesInfo 表示所有特征都是连续的。

    val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int]()
val impurity = "gini"
val maxDepth = 5
val maxBins = 32

val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,
impurity, maxDepth, maxBins)


val labelAndPreds = trainingData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}

//Training error
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / trainingData.count
println("Training Error = " + trainErr)

//Model Output
println("Learned classification tree model:\n" + model)

println("Learned classification tree model:\n" + model.toDebugString)

我希望“model.toDebugString”作为文本文件写入或输出。我找到了很多与这个问题类似的答案,但并不具体。如果能提供具体的帮助或提示,将有很大的帮助。由于我是 SCALA 的新手,所以我面临要包含的适当库的问题。

我试过下面的代码:

    modelFile = ~/decisionTreeModel.txt"
f = open(modelFile,"w")
f.write(model.toDebugString())
f.close()

但它给了我这个错误:

<console>:1: error: ';' expected but '.' found.
modelFile = ~/decisionTreeModel.txt"
^
<console>:1: error: unclosed string literal
modelFile = ~/decisionTreeModel.txt"
^

此外,尝试保存模型:

// Save and load model
model.save(sc, "myModelPath")
val sameModel = DecisionTreeModel.load(sc, "myModelPath")

上面的代码也抛出错误。感谢您的帮助或建议。

最佳答案

试试这个(例如在 shell 上):

snow:~ mkamp$ spark-shell 

...

scala> val rdd = sc.parallelize(List(1,2,3))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:15

scala> new java.io.PrintWriter("/tmp/decisionTreeModel.txt") { writeln(rdd.toDebugString); close }
res0: java.io.PrintWriter = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@65fc2639

然后在命令行上(在 Spark 之外)。

snow:~ mkamp$ cat /tmp/decisionTreeModel.txt 
(4) ParallelCollectionRDD[0] at parallelize at <console>:15 []

关于scala - 将决策树训练分类器的模型输出保存为 Spark Scala 平台中的文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33183857/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com