gpt4 book ai didi

scala - Spark Scala : How to convert Dataframe[vector] to DataFrame[f1:Double, ...,fn:双)]

转载 作者:行者123 更新时间:2023-12-04 16:27:01 27 4
gpt4 key购买 nike

我刚刚使用 Standard Scaler 为 ML 应用程序规范化我的功能。选择缩放特征后,我想将其转换回 double 数据帧,尽管我的向量的长度是任意的。我知道如何通过使用来为特定的 3 个功能做到这一点

myDF.map{case Row(v: Vector) => (v(0), v(1), v(2))}.toDF("f1", "f2", "f3")

但不适用于任意数量的功能。是否有捷径可寻?

例子:
val testDF = sc.parallelize(List(Vectors.dense(5D, 6D, 7D), Vectors.dense(8D, 9D, 10D), Vectors.dense(11D, 12D, 13D))).map(Tuple1(_)).toDF("scaledFeatures")
val myColumnNames = List("f1", "f2", "f3")
// val finalDF = DataFrame[f1: Double, f2: Double, f3: Double]

编辑

我在创建数据框时发现了如何解压到列名,但仍然无法将向量转换为创建数据框所需的序列:
finalDF = testDF.map{case Row(v: Vector) => v.toArray.toSeq /* <= this errors */}.toDF(List("f1", "f2", "f3"): _*)

最佳答案

请尝试VectorSlicer :

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

val dataset = spark.createDataFrame(
Seq((1, 0.2, 0.8), (2, 0.1, 0.9), (3, 0.3, 0.7))
).toDF("id", "negative_logit", "positive_logit")


val assembler = new VectorAssembler()
.setInputCols(Array("negative_logit", "positive_logit"))
.setOutputCol("prediction")

val output = assembler.transform(dataset)
output.show()
/*
+---+--------------+--------------+----------+
| id|negative_logit|positive_logit|prediction|
+---+--------------+--------------+----------+
| 1| 0.2| 0.8| [0.2,0.8]|
| 2| 0.1| 0.9| [0.1,0.9]|
| 3| 0.3| 0.7| [0.3,0.7]|
+---+--------------+--------------+----------+
*/

val slicer = new VectorSlicer()
.setInputCol("prediction")
.setIndices(Array(1))
.setOutputCol("positive_prediction")

val posi_output = slicer.transform(output)
posi_output.show()

/*
+---+--------------+--------------+----------+-------------------+
| id|negative_logit|positive_logit|prediction|positive_prediction|
+---+--------------+--------------+----------+-------------------+
| 1| 0.2| 0.8| [0.2,0.8]| [0.8]|
| 2| 0.1| 0.9| [0.1,0.9]| [0.9]|
| 3| 0.3| 0.7| [0.3,0.7]| [0.7]|
+---+--------------+--------------+----------+-------------------+
*/

关于scala - Spark Scala : How to convert Dataframe[vector] to DataFrame[f1:Double, ...,fn:双)],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38110038/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com