gpt4 book ai didi

python - 值错误: endog and exog matrices are different sizes - how to drop data in specific columns only?

转载 作者:太空宇宙 更新时间:2023-11-03 21:28:37 24 4
gpt4 key购买 nike

我正在尝试运行多变量回归并收到错误:

“ValueError:endog 和 exog 矩阵大小不同”

我的代码片段如下:

df_raw = pd.DataFrame(data=df_raw)

y = (df_raw['daily pct return']).astype(float)
x1 = (df_raw['Excess daily return']).astype(float)
x2 = (df_raw['Excess weekly return']).astype(float)
x3 = (df_raw['Excess monthly return']).astype(float)
x4 = (df_raw['Trading vol / mkt cap']).astype(float)
x5 = (df_raw['Std dev']).astype(float)
x6 = (df_raw['Residual risk']).astype(float)

y = y.replace([np.inf, -np.inf],np.nan).dropna()

print(y.shape)
print(x1.shape)
print(x2.shape)
print(x3.shape)
print(x4.shape)
print(x5.shape)
print(x6.shape)


df_raw.to_csv('Raw_final.csv', header=True)

result = smf.OLS(exog=y, endog=[x1, x2, x3, x4, x5, x6]).fit()
print(result.params)
print(result.summary())

正如您从我的代码中看到的,我正在检查每个变量的“形状”。我得到以下输出,表明错误的原因是 y 变量只有 48392 个值,而所有其他变量都有 48393 个值:

(48392,)(48393,)(48393,)(48393,)(48393,)(48393,)(48393,)

我的数据框如下所示:

  daily pct return | Excess daily return | weekly pct return | index weekly pct return | Excess weekly return | monthly pct return | index monthly pct return | Excess monthly return | Trading vol / mkt cap |   Std dev   
------------------|---------------------|-------------------|-------------------------|----------------------|--------------------|--------------------------|-----------------------|-----------------------|-------------
| | | | | | | | 0.207582827 |
0.262658228 | 0.322397801 | | | | | | | 0.285585677 |
0.072681704 | 0.126445534 | | | | | | | 0.272920624 |
0.135514019 | 0.068778682 | | | | | | | 0.213149083 |
-0.115226337 | -0.173681889 | | | | | | | 0.155653699 |
-0.165116279 | -0.176569405 | | | | | | | 0.033925024 |
0.125348189 | 0.079889239 | | | | | | | 0.030968484 | 0.544133212
0.022277228 | -0.044949678 | | | | | | | 0.020735381 | 0.385659608
0.150121065 | 0.102119782 | | | | | | | 0.063563881 | 0.430868447
0.336842105 | 0.333590483 | | | | | | | 0.210193049 | 0.893734807
0.011023622 | -0.011860658 | 0.320987654 | -0.657089012 | 0.978076666 | | | | 0.100468109 | 1.137976483
0.37694704 | 0.308505907 | | | | | | | 0.135828281 | 1.867394416

是否有人有一个优雅的解决方案来对齐矩阵的大小,以便我不再收到此错误?我想我需要从 y 变量中删除第一行值(“每日 pct 返回”),但我不确定如何实现这一点?

提前致谢!!

最佳答案

终于解决问题了!存在三个问题:

1) y 变量的大小为 48392,而其他 6 个变量的大小均为 48393。为了解决此问题,我添加了以下代码行来删除第一行:

df_raw = df_raw.drop([0])

2)我的数据框有很多空单元格。除非每个单元格都有值,否则无法执行回归。因此,我添加了一些代码,用 NaN 替换所有 infs 和空单元格,然后用 0 值填充所有 NaN。代码片段:

df_raw ['daily pct return']= df_raw ['daily pct return'].replace([np.inf, -np.inf],np.nan)
df_raw = df_raw.replace(r'\s+', np.nan, regex=True).replace('', np.nan)
df_raw.fillna(value=0, axis=1,inplace=True)

3)我编写多元回归公式的方式是错误的。我更正如下:

result = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6', data=df_raw).fit()

总而言之,我更新的代码现在如下:

df_raw = pd.DataFrame(data=df_raw)
df_raw = df_raw.drop([0])
df_raw ['daily pct return']= df_raw ['daily pct return'].replace([np.inf, -np.inf],np.nan)
df_raw = df_raw.replace(r'\s+', np.nan, regex=True).replace('', np.nan)
df_raw.fillna(value=0, axis=1,inplace=True)
df_raw.to_csv('Raw_final.csv', header=True)


# Define variables for regression
y = (df_raw['daily pct return']).astype(float)
x1 = (df_raw['Excess daily return']).astype(float)
x2 = (df_raw['Excess weekly return']).astype(float)
x3 = (df_raw['Excess monthly return']).astype(float)
x4 = (df_raw['Trading vol / mkt cap']).astype(float)
x5 = (df_raw['Std dev']).astype(float)
x6 = (df_raw['Residual risk']).astype(float)

# Check shape of variables to confirm they are of the same size
print(y.shape)
print(x1.shape)
print(x2.shape)
print(x3.shape)
print(x4.shape)
print(x5.shape)
print(x6.shape)

# Perform regression
result = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6', data=df_raw).fit()
print(result.params)
print(result.summary())

关于python - 值错误: endog and exog matrices are different sizes - how to drop data in specific columns only?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53684234/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com