gpt4 book ai didi

hadoop - Spark ml 模型保存到 hdfs

转载 作者:可可西里 更新时间:2023-11-01 16:44:23 24 4
gpt4 key购买 nike

我正在尝试将我的模型保存为从 spark ml 库创建的对象。

但是,它给我一个错误:

线程“main”中的异常 java.lang.NoSuchMethodError: org.apache.spark.ml.PipelineModel.save(Ljava/lang/String;)V 在 com.sf.prediction$.main(prediction.scala:61) 在 com.sf.prediction.main(prediction.scala) 在 sun.reflect.NativeMethodAccessorImpl.invoke0( native 方法) 在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在 java.lang.reflect.Method.invoke(Method.java:606) 在 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) 在 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) 在 org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) 在 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) 在 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

下面是我的依赖:

    <dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>2.1.7</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<type>maven-plugin</type>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-parser-combinators</artifactId>
<version>2.11.0-M4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.2</version>
</dependency>

<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.1</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.6.0</version>
</dependency>

我还想将模型生成的数据框保存为 csv。

model.transform(df).select("features","label","prediction").show()



import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

import org.apache.spark.SparkConf

import org.apache.spark.sql.hive.HiveContext


import org.apache.spark.ml.feature.OneHotEncoder
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.PipelineModel._
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer}
import org.apache.spark.ml.util.MLWritable

object prediction {
def main(args: Array[String]): Unit = {

val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("conversion")
val sc = new SparkContext(conf)

val hiveContext = new HiveContext(sc)

val df = hiveContext.sql("select * from prediction_test")
df.show()
val credit_indexer = new StringIndexer().setInputCol("transaction_credit_card").setOutputCol("creditCardIndex").fit(df)
val category_indexer = new StringIndexer().setInputCol("transaction_category").setOutputCol("categoryIndex").fit(df)
val location_flag_indexer = new StringIndexer().setInputCol("location_flag").setOutputCol("locationIndex").fit(df)
val label_indexer = new StringIndexer().setInputCol("fraud").setOutputCol("label").fit(df)

val assembler = new VectorAssembler().setInputCols(Array("transaction_amount", "creditCardIndex","categoryIndex","locationIndex")).setOutputCol("features")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)
val pipeline = new Pipeline().setStages(Array(credit_indexer, category_indexer, location_flag_indexer, label_indexer, assembler, lr))

val model = pipeline.fit(df)

pipeline.save("/user/f42h/prediction/pipeline")
model.save("/user/f42h/prediction/model")
// val sameModel = PipelineModel.load("/user/bob/prediction/model")
model.transform(df).select("features","label","prediction")

}
}

最佳答案

您使用的是 Spark 1.6.0,据我所知,ML 模型的保存/加载仅适用于 2.0 及更高版本。您可以使用具有 2.0.0-preview 版本的工件进行预览:http://search.maven.org/#search%7Cga%7C1%7Cg%3Aorg.apache.spark%20v%3A2.0.0-preview

关于hadoop - Spark ml 模型保存到 hdfs,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37848723/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com