gpt4 book ai didi

python - 如何在Python中向GLM添加总和为零约束?

转载 作者:行者123 更新时间:2023-12-01 04:46:43 25 4
gpt4 key购买 nike

我使用 statsmodel glm 函数在 Python 中设置了一个模型,但现在我想向模型添加一个总和为零的约束。

模型定义如下:

import statsmodels.formula.api as smf
model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit()

在 R 中,要添加约束,我只需执行以下操作:

model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum"))

这给 C 和 D 添加了总和为零的约束,但我不确定如何在 Python 中实现相同的效果。

我已经看到有一个 fit_constraint() 方法可用,但我不太确定如何使用它,或者它是否是实现我所需要的正确方法。

http://statsmodels.sourceforge.net/devel/generated/statsmodels.genmod.generalized_linear_model.GLM.fit_constrained.html#statsmodels.genmod.generalized_linear_model.GLM.fit_constrained

任何人都可以提供应用此限制的任何建议吗?

最佳答案

这是一个使用高斯族来说明fit_constrained的示例,因为我没有很快找到带有分类变量的泊松示例

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm

url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")

mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())

约束所有系数相加为零

res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())

Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: write No. Observations: 200
Model: GLM Df Residuals: 197
Model Family: Gaussian Df Model: 2
Link Function: identity Scale: 1232.08314649
Method: IRLS Log-Likelihood: -993.41
Date: Wed, 25 Mar 2015 Deviance: 2.4149e+05
Time: 16:42:37 Pearson chi2: 2.41e+05
No. Iterations: 1
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1] 1.0002 221.565 0.005 0.996 -433.260 435.260
C(race)[2] -41.1814 267.253 -0.154 0.878 -564.988 482.626
C(race)[3] -6.3498 235.771 -0.027 0.979 -468.453 455.754
C(race)[4] 46.5311 100.184 0.464 0.642 -149.827 242.889
==============================================================================

Model has been estimated subject to linear equality constraints.

约束以逗号分隔,默认为零:

res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())

最后打印

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable: write No. Observations: 200
Model: GLM Df Residuals: 198
Model Family: Gaussian Df Model: 1
Link Function: identity Scale: 1438.99574167
Method: IRLS Log-Likelihood: -1008.9
Date: Wed, 25 Mar 2015 Deviance: 2.8204e+05
Time: 16:42:37 Pearson chi2: 2.82e+05
No. Iterations: 1
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1] 13.6286 242.003 0.056 0.955 -460.689 487.946
C(race)[2] -13.6286 242.003 -0.056 0.955 -487.946 460.689
C(race)[3] -41.6606 111.458 -0.374 0.709 -260.115 176.794
C(race)[4] 41.6606 111.458 0.374 0.709 -176.794 260.115
==============================================================================

Model has been estimated subject to linear equality constraints.

我不确定 Patsy 公式如何工作,以便在存在多个分类解释变量时不会删除任何级别。

关于python - 如何在Python中向GLM添加总和为零约束?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29261018/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com