gpt4 book ai didi

scala - 流水线后如何将变量名映射到要素

转载 作者:行者123 更新时间:2023-12-04 06:16:45 32 4
gpt4 key购买 nike

我修改了OneHotEncoder示例,以实际训练LogisticRegression。我的问题是如何将生成的权重映射回分类变量?

def oneHotEncoderExample(sqlContext: SQLContext): Unit = {

val df = sqlContext.createDataFrame(Seq(
(0, "a", 1.0),
(1, "b", 1.0),
(2, "c", 0.0),
(3, "d", 1.0),
(4, "e", 1.0),
(5, "f", 0.0)
)).toDF("id", "category", "label")
df.show()

val indexer = new StringIndexer()
.setInputCol("category")
.setOutputCol("categoryIndex")
.fit(df)
val indexed = indexer.transform(df)
indexed.select("id", "categoryIndex").show()

val encoder = new OneHotEncoder()
.setInputCol("categoryIndex")
.setOutputCol("features")
val encoded = encoder.transform(indexed)
encoded.select("id", "features").show()


val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.01)

val pipeline = new Pipeline()
.setStages(Array(indexer, encoder, lr))

// Fit the pipeline to training documents.
val pipelineModel = pipeline.fit(df)

val lorModel = pipelineModel.stages.last.asInstanceOf[LogisticRegressionModel]
println(s"LogisticRegression: ${(lorModel :LogisticRegressionModel)}")
// Print the weights and intercept for logistic regression.
println(s"Weights: ${lorModel.coefficients} Intercept: ${lorModel.intercept}")
}

产出

Weights: [1.5098946631236487,-5.509833649232324,1.5098946631236487,1.5098946631236487,-5.509833649232324] Intercept: 2.6679020381781235

最佳答案

我假设您要在此处访问要素元数据。让我们从转换现有的DataFrame开始:

val transformedDF = pipelineModel.transform(df)

接下来,您可以提取元数据对象:
val meta: org.apache.spark.sql.types.Metadata = transformedDF
.schema(transformedDF.schema.fieldIndex("features"))
.metadata

最后,让我们提取属性:
meta.getMetadata("ml_attr").getMetadata("attrs")
// org.apache.spark.sql.types.Metadata = {"binary":[
// {"idx":0,"name":"e"},{"idx":1,"name":"f"},{"idx":2,"name":"a"},
// {"idx":3,"name":"b"},{"idx":4,"name":"c"}]}

这些可用于将权重与原始特征相关联。

关于scala - 流水线后如何将变量名映射到要素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36122559/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com