gpt4 book ai didi

python-3.x - python3 - pandas 确定事件发生是否具有统计显着性

转载 作者:行者123 更新时间:2023-12-05 02:58:36 24 4
gpt4 key购买 nike

我有一个大型数据集,如下所示。我想知道事件发生的时间与事件不发生的时间之间是否存在显着的统计差异。这里的假设是百分比变化越高越有意义/越好。

在另一个数据集中,“事件发生”列是“真、假、中性”。 (请忽略该索引,因为它是默认的 pandas 索引。)

   index    event occurs            percent change
148 False 11.27
149 True 14.56
150 False 10.35
151 False 6.07
152 False 21.14
153 False 7.26
154 False 7.07
155 False 5.37
156 True 2.75
157 False 7.12
158 False 7.24

当它是“真/假”或“真/假/中性”时,确定重要性的最佳方法是什么?

最佳答案

Load Packages, Set Globals, Make Data.

import scipy.stats as stats
import numpy as np

n = 60
stat_sig_thresh = 0.05

event_perc = pd.DataFrame({"event occurs": np.random.choice([True,False],n),
"percent change": [i*.1 for i in np.random.randint(1,1000,n)]})

Determine if Distribution is Normal

stat_sig = event_perc.groupby("event occurs").apply(lambda x: stats.normaltest(x))
stat_sig = pd.DataFrame(stat_sig)
stat_sig = pd.DataFrame(stat_sig[0].values.tolist(), index=stat_sig.index).reset_index()
stat_sig.loc[(stat_sig.pvalue <= stat_sig_thresh), "Normal"] = False
stat_sig["Normal"].fillna("True",inplace=True)

>>>stat_sig

event occurs statistic pvalue Normal
0 False [2.9171920993203915] [0.23256255191146755] True
1 True [2.938332679486047] [0.23011724484588764] True

Determine Statistical Significance

normal = [bool(i) for i in stat_sig.Normal.unique().tolist()]

rvs1 = event_perc["percent change"][event_perc["event occurs"] == True]
rvs2 = event_perc["percent change"][event_perc["event occurs"] == False]

if (len(normal) == 1) & (normal[0] == True):
print("the distributions are normal")
if stats.ttest_ind(rvs1,rvs2).pvalue >= stat_sig_thresh:
# we cannot reject the null hypothesis of identical average scores
print("we can't say whether there is statistically significant difference")
else:
# we reject the null hypothesis of equal averages
print("there is a statisically significant difference")

elif (len(normal) == 1) & (normal[0] == False):
print("the distributions are not normal")
if stats.wilcoxon(rvs1,rvs2).pvalue >= stat_sig_thresh:
# we cannot reject the null hypothesis of identical average scores
print("we can't say whether there is statistically significant difference")
else:
# we reject the null hypothesis of equal averages
print("there is a statisically significant difference")
else:
print("samples are drawn from different distributions")

the distributions are normal
we can't say whether there is statistically significant difference

关于python-3.x - python3 - pandas 确定事件发生是否具有统计显着性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58770711/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com