gpt4 book ai didi

python - 解释 PCA 后的 OLS 权重(Python)

转载 作者:行者123 更新时间:2023-12-01 08:12:34 26 4
gpt4 key购买 nike

我想解释模型中的回归模型权重,其中输入数据已使用PCA进行预处理。实际上,我有数百个高度相关的输入维度,所以我知道 PCA 很有用。不过,为了便于说明,我将使用 Iris 数据集。

下面的sklearn代码说明了我的问题:

import numpy as np
import sklearn.datasets, sklearn.decomposition
from sklearn.linear_model import LinearRegression

# load data
X = sklearn.datasets.load_iris().data
w = np.array([0.3, 10, -0.1, -0.01])
Y = np.dot(X, w)

# set number of components to keep from PCA
n_components = 4

# reconstruct w
reg = LinearRegression().fit(X, Y)
w_hat = reg.coef_
print(w_hat)

# apply PCA
pca = sklearn.decomposition.PCA(n_components=n_components)
pca.fit(X)
X_trans = pca.transform(X)

# reconstruct w
reg_trans = LinearRegression().fit(X_trans, Y)
w_trans_hat = np.dot(reg_trans.coef_, pca.components_)
print(w_trans_hat)

运行此代码,可以看到权重复制得很好。

但是,如果我将组件数量设置为 3(即 n_components = 3),那么打印出来的权重将与真实权重有很大偏差。

我是否误解了如何转换回这些权重?还是因为PCA的信息丢失从4个分量变成了3个分量?

最佳答案

我认为这工作正常,只是我正在查看 w_trans_hat 而不是重建的 Y:

import numpy as np
import sklearn.datasets, sklearn.decomposition
from sklearn.linear_model import LinearRegression

# load data
X = sklearn.datasets.load_iris().data
# create fake loadings
w = np.array([0.3, 10, -0.1, -0.01])
# centre X
X = np.subtract(X, np.mean(X, 0))
# calculate Y
Y = np.dot(X, w)

# set number of components to keep from PCA
n_components = 3

# reconstruct w using linear regression
reg = LinearRegression().fit(X, Y)
w_hat = reg.coef_
print(w_hat)

# apply PCA
pca = sklearn.decomposition.PCA(n_components=n_components)
pca.fit(X)
X_trans = pca.transform(X)

# regress Y on principal components
reg_trans = LinearRegression().fit(X_trans, Y)
# reconstruct Y using regressed weights and transformed X
Y_trans = np.dot(X_trans, reg_trans.coef_)
# show MSE to original Y
print(np.mean((Y - Y_trans) ** 2))

# show w implied by reduced model in original space
w_trans_hat = np.dot(reg_trans.coef_, pca.components_)
print(w_trans_hat)

关于python - 解释 PCA 后的 OLS 权重(Python),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55148756/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com