gpt4 book ai didi

python - statsmodels patsy 假设测试约束中的分类变量 'C()'

转载 作者:太空宇宙 更新时间:2023-11-04 02:35:07 26 4
gpt4 key购买 nike

您好,我正在使用 statsmodel 运行以下模型,它运行良好。

from statsmodels.formula.api import ols
from statsmodels.iolib.summary2 import summary_col #for summary stats of large tables
time_FE_str = ' + C(hour_of_day) + C(day_of_week) + C(week_of_year)'
weather_2_str = ' + C(weather_index) + rain + extreme_temperature + wind_speed'
model = ols("activity_count ~ C(city_id)"+weather_2_str+time_FE_str, data=df)
results = model.fit()
print summary_col(results).tables

print 'F-TEST:'
hypotheses = '(C(weather_index) = 0), (rain=0), (extreme_temperature=0), (wind_speed=0)'
f_test = results.f_test(hypotheses)

但是,如果我想包含分类变量 C(weather_index),我不知道如何为 F 检验制定假设。我为我尝试了所有可以想象的版本,但我总是遇到错误。

以前有人遇到过这个问题吗?

有什么想法吗?

F-TEST:
Traceback (most recent call last):
File "C:/VK/scripts_python/predict_activity.py", line 95, in <module>
f_test = results.f_test(hypotheses)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\statsmodels\base\model.py", line 1375, in f_test
invcov=invcov, use_f=True)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\statsmodels\base\model.py", line 1437, in wald_test
LC = DesignInfo(names).linear_constraint(r_matrix)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\design_info.py", line 536, in linear_constraint
return linear_constraint(constraint_likes, self.column_names)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 391, in linear_constraint
tree = parse_constraint(code, variable_names)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 225, in parse_constraint
return infix_parse(_tokenize_constraint(string, variable_names),
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 184, in _tokenize_constraint
Origin(string, offset, offset + 1))
patsy.PatsyError: unrecognized token in constraint
(C(weather_index) = 0), (rain=0), (extreme_temperature=0), (wind_speed=0)
^

最佳答案

方法 t_test、wald_test 和 f_test 用于直接对参数进行假设检验,而不是用于整个分类或复合效应。

Results.summary() 显示 patsy 为分类变量创建的参数名称。这些可用于为分类效应创建对比或限制。

作为替代方案,anova_lm 直接计算一个术语的假设检验,例如分类变量无效。

关于python - statsmodels patsy 假设测试约束中的分类变量 'C()',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48097949/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com