gpt4 book ai didi

python - 如何在 Python 上的 PCA 中创建相关矩阵?

转载 作者:太空宇宙 更新时间:2023-11-03 21:42:14 25 4
gpt4 key购买 nike

如何在 Python 上的 PCA 中创建相关矩阵?下面,我通过 pca.components_ 创建特征向量载荷的 DataFrame,但我不知道如何创建实际的相关矩阵(即这些载荷与主成分的相关程度如何)。有什么线索吗?

此外,我还意识到 Python 中许多特征向量载荷都是负的。我正在尝试复制在 Stata 中进行的一项研究,奇怪的是,当 Stata 相关性为正时,Python 负载似乎为负(请参阅我试图在 Python 中复制的随附相关矩阵图像)。这只是我注意到的事情——这里发生了什么?

Stata-Created Correlation Matrix

提前致谢。

import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from dateutil.relativedelta import relativedelta
import blpinterface.blp_interface as blp
from scipy.stats import zscore
from sklearn.decomposition import PCA

#Set dates for analysis
startDate = "20000101"

#Construct tickers for analysis
tickers = ["USGG2YR Index", "USGG5YR Index", "USGG10YR Index", "USGG30YR Index", "USGGT10Y Index", ".30YREAL Index",
"USGGBE10 Index", "USGGBE30 Index", ".RATEVOL1 Index", ".RATEVOL2 Index", "SPX Index", "S5INDU Index", "S5CONS Index", "VIX Index",
".DMFX Index", ".EMFX Index", "CL1 Comdty", "HG1 Comdty", "XAU Curncy"]

#Begin dataframe construction
mgr = blp.BLPInterface()

df = mgr.historicalRequest(tickers, "PX_LAST", startDate, "20160317")
df = df.dropna()
df = df.apply(zscore)

#Conduct PCA analysis
pca=PCA(n_components=3)
pca.fit(df) #Estimates the eigenvectors of the dataframe with 18x variables for data dating back to 2000
print(pd.DataFrame(pca.components_, columns=tickersclean, index=["PC1", "PC2", "PC3"]).transpose()) #Eigenvectors with loadings, sorted from highest explained variance to lowest
print(pca.explained_variance_) #Eigenvalues (sum of squares of the distance between the projected data points and the origin along the eigenvector)
print(pca.explained_variance_ratio_) #Explained variance ratio (i.e. how much of the change in the variables in the time series is explained by change in the respective principal component); eigenvalue/(n variables)

#Project data onto the above loadings for each row in the time series
outputpca = pd.DataFrame(pca.transform(df), columns=['PCA%i' % i for i in range(3)], index=df.index)
outputpca.columns = ["PC1", "PC2", "PC3"]
print(outputpca) #Principal component time series, projecting the data onto the above loadings; this is the sum product of the data and the eigenvector loadings for all three PCs for each row
outputpca.plot(title="Principal Components")
plt.show()

最佳答案

您可以使用numpy模块中存在的相关性。示例:

cor_mat1 = np.corrcoef(X_std.T)
eig_vals, eig_vecs = np.linalg.eig(cor_mat1)
print('Eigenvectors \n%s' %eig_vecs)
print('\nEigenvalues \n%s' %eig_vals)

这个link提出了在 PCA 中使用相关矩阵的应用。

关于python - 如何在 Python 上的 PCA 中创建相关矩阵?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52769130/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com