gpt4 book ai didi

apache-spark - ClassCastException : org. apache.spark.ml.linalg.DenseVector 无法转换为 org.apache.spark.mllib.linalg.Vector

转载 作者:行者123 更新时间:2023-12-04 04:42:57 25 4
gpt4 key购买 nike

有人可以帮我解决以下错误吗?我正在尝试将数据帧转换为 rdd,以便它可以用于回归模型构建。

Spark 版本:2.0.0

错误 =>
ClassCastException: org.apache.spark.ml.linalg。 密集矢量 无法转换为 org.apache.spark.mllib.linalg。 矢量

代码 =>

import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.Row

val binarizer2: Binarizer = new Binarizer()
.setInputCol("repay_amt").setOutputCol("label").setThreshold(20.00)

df = binarizer2.transform(df)

val assembler = new VectorAssembler()
.setInputCols(Array("tot_txns", "avg_unpaiddue", "max_unpaiddue", "sale_txn", "max_amt", "tot_sale_amt")).setOutputCol("features")

df = assembler.transform(df)

df.write.mode(SaveMode.Overwrite).parquet("lazpay_final_data.parquet")

val df2 = spark.read.parquet("lazpay_final_data.parquet/")
val df3= df2.rdd.map(r => LabeledPoint(r.getDouble(0),r.getAs("features")))

数据 =>

enter image description here

最佳答案

我首先将 ml SparseVector 转换为 Dense Vector,然后再转换为 mllib Vector,从而解决了这个问题。

例如:

val denseVector = r.getAs[org.apache.spark.ml.linalg.SparseVector]("features").toDense
org.apache.spark.mllib.linalg.Vectors.fromML(denseVector)

关于apache-spark - ClassCastException : org. apache.spark.ml.linalg.DenseVector 无法转换为 org.apache.spark.mllib.linalg.Vector,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40109807/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com