gpt4 book ai didi

apache-spark - 使用 Spark KMeans 算法打印 ClusterID 及其元素。

转载 作者:行者123 更新时间:2023-12-04 04:53:53 26 4
gpt4 key购买 nike

我有这个程序可以在 apache-spark 上打印 Kmeans 算法的 MSSE。生成了 20 个簇。我正在尝试打印 clusterID 和分配给相应 clusterID 的元素。我如何遍历 clusterID 来打印元素。

谢谢大家!!

           val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar"))
// Load and parse the data
val data = sc.textFile("kmeans.csv")
val parsedData = data.map( s => Vectors.dense(s.split(',').map(_.toDouble)))

// Cluster the data into two classes using KMeans
val numIterations = 20
val numClusters = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
val clusterCenters = clusters.clusterCenters map (_.toArray)
println("The Cluster Centers are = " + clusterCenters)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)

最佳答案

据我所知,您应该为每个元素运行预测。

    KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

List<Vector> vectors = parsedData.collect();
for(Vector vector: vectors){
System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString());
}

关于apache-spark - 使用 Spark KMeans 算法打印 ClusterID 及其元素。,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26939281/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com