gpt4 book ai didi

machine-learning - Pyspark k 倍交叉验证平均 RMSE

转载 作者:行者123 更新时间:2023-11-30 08:45:28 25 4
gpt4 key购买 nike

我正在使用 Pyspark 在数据集上运行线性回归和 k 倍交叉验证。我目前只能确定最佳模型的 RMSE。但我想要交叉验证中评估的所有模型的平均 RMSE。如何获得交叉验证中所有评估模型的平均 RMSE?

from pyspark.ml.regression import LinearRegression
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

file_name = '/tmp/user/userfile/LS.csv'
data = spark.read.options(header='true', inferschema='true',
delimiter=',').csv(file_name)
data.cache()
features = ["x"]
lr_data = data.select(col("y").alias("label"), *features)
(training, test) = lr_data.randomSplit([.7, .3])

vectorAssembler = VectorAssembler(inputCols=features, outputCol="features")
training_ds = vectorAssembler.transform(training)
test_ds = vectorAssembler.transform(test)

lr = LinearRegression(maxIter=5, solver="l-bfgs") # solver="l-bfgs" here

modelEvaluator=RegressionEvaluator()

paramGrid = ParamGridBuilder().addGrid(lr.regParam, [0.1,0.01])
.addGrid(lr.elasticNetParam, [0, 1]).build()

crossval = CrossValidator(estimator=lr,
estimatorParamMaps=paramGrid,
evaluator=modelEvaluator,
numFolds=2)

cvModel = crossval.fit(training_ds)

prediction = cvModel.transform(test_ds)

evaluator = RegressionEvaluator(labelCol="label",
predictionCol="prediction",
metricName="rmse")

rms = evaluator.evaluate(prediction)
print("Root Mean Squared Error (RMSE) on test data = %g" % rms)

最佳答案

只需从交叉验证器中提取其他模型

Spark CrossValidatorModel access other models than the bestModel?

然后对每个数据进行回归评估器并手动计算平均值。

关于machine-learning - Pyspark k 倍交叉验证平均 RMSE,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53804250/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com