gpt4 book ai didi

scala - Spark MLLib线性回归模型截距始终为0.0?

转载 作者:行者123 更新时间:2023-12-01 23:52:34 26 4
gpt4 key购买 nike

我刚刚开始使用 ML 和 Apache Spark,因此我一直在尝试基于 Spark 示例的线性回归。除了示例中的样本之外,我似乎无法为任何数据生成正确的模型,并且无论输入数据如何,截距始终为 0.0。

我根据函数准备了一个简单的训练数据集:

y = (2*x1) + (3*x2) + 4

即我预计截距为 4,权重为 (2, 3)。

如果我对原始数据运行 LinearRegressionWithSGD.train(...) ,模型为:

Model intercept: 0.0, weights: [NaN,NaN]

预测结果都是 NaN:

Features: [1.0,1.0], Predicted: NaN, Actual: 9.0
Features: [1.0,2.0], Predicted: NaN, Actual: 12.0

等等

如果我先缩放数据,我会得到:

Model intercept: 0.0, weights: [17.407863391511754,2.463212481736855]

Features: [1.0,1.0], Predicted: 19.871075873248607, Actual: 9.0
Features: [1.0,2.0], Predicted: 22.334288354985464, Actual: 12.0
Features: [1.0,3.0], Predicted: 24.797500836722318, Actual: 15.0

等等

要么我做错了什么,要么我不明白这个模型的输出应该是什么,所以有人可以建议我在这里可能出错的地方吗?

我的代码如下:

   // Load and parse the dummy data (y, x1, x2) for y = (2*x1) + (3*x2) + 4
// i.e. intercept should be 4, weights (2, 3)?
val data = sc.textFile("data/dummydata.txt")

// LabeledPoint is (label, [features])
val parsedData = data.map { line =>
val parts = line.split(',')
val label = parts(0).toDouble
val features = Array(parts(1), parts(2)) map (_.toDouble)
LabeledPoint(label, Vectors.dense(features))
}

// Scale the features
val scaler = new StandardScaler(withMean = true, withStd = true)
.fit(parsedData.map(x => x.features))
val scaledData = parsedData
.map(x =>
LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray))))

// Building the model: SGD = stochastic gradient descent
val numIterations = 1000
val step = 0.2
val model = LinearRegressionWithSGD.train(scaledData, numIterations, step)

println(s">>>> Model intercept: ${model.intercept}, weights: ${model.weights}")`

// Evaluate model on training examples
val valuesAndPreds = scaledData.map { point =>
val prediction = model.predict(point.features)
(point.label, point.features, prediction)
}
// Print out features, actual and predicted values...
valuesAndPreds.take(10).foreach({case (v, f, p) =>
println(s"Features: ${f}, Predicted: ${p}, Actual: ${v}")})

最佳答案

@Noah:谢谢 - 你的建议促使我再次查看这个,我发现 some example code here它允许您生成截距并通过优化器设置其他参数,例如迭代次数。

这是我修改后的模型生成代码,它似乎在我的虚拟数据上运行正常:

  // Building the model: SGD = stochastic gradient descent:
// Need to setIntercept = true, and seems only to work with scaled data
val numIterations = 600
val stepSize = 0.1
val algorithm = new LinearRegressionWithSGD()
algorithm.setIntercept(true)
algorithm.optimizer
.setNumIterations(numIterations)
.setStepSize(stepSize)

val model = algorithm.run(scaledData)

它似乎仍然需要缩放数据,而不是原始数据作为输入,但这对于我的目的来说是可以的。

关于scala - Spark MLLib线性回归模型截距始终为0.0?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26259743/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com