gpt4 book ai didi

python - 如何使用 python pandas 找到 Shapiro-Wilk?

转载 作者:行者123 更新时间:2023-11-28 17:04:32 27 4
gpt4 key购买 nike

我需要为数据框找到 shapiro wilk test。

关于夏皮罗威尔克 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html

数据框 1:

Stationid
10
11
12
13
14
15
16
17

数据框 2:

Stationid  Maintanance
10 55
15 38
21 100
10 56
22 101
15 39
10 56

我需要 shapiro wilk 在数据帧 2 上的数据帧 1 中获取站点 ID

预期输出

Stationid   W           P 
10 0.515 55.666667
15 0.555 38.500000

注意:表中给出的W,p不是正确值。

最佳答案

首先按 isin 过滤然后使用 GroupBy.apply将新列的输出转换为 Series:

#check if numeric
print (df2['Maintanance'].dtypes)
int64

from scipy.stats import shapiro

df3 = df2[df2['Stationid'].isin(df1['Stationid'])]

df = (df3.groupby('Stationid')
.apply(lambda x: pd.Series(shapiro(x), index=['W','P']))
.reset_index())
print (df)
Stationid W P
0 10 0.689908 0.004831
1 15 0.747003 0.036196

编辑:

data = ['abc15','acv1','acv2','acv3','acv4','abc18','acv5','acv6'] 
df1 = pd.DataFrame(data,columns=['Stationid'])
print (df1)
Stationid
0 abc15
1 acv1
2 acv2
3 acv3
4 acv4
5 abc18
6 acv5
7 acv6

data1=[['abc15',55],['abc18',38],['ark',100],['abc15',56],['ark',101],['abc19',39],['abc15',56]]
df2=pd.DataFrame(data1,columns=['Stationid','Maintanance'])
print(df2)
Stationid Maintanance
0 abc15 55
1 abc18 38
2 ark 100
3 abc15 56
4 ark 101
5 abc19 39
6 abc15 56

问题是 shapiro cannot working if number of values is less as 3 ,因此添加了对长度为 >2 的数据的过滤:

from scipy.stats import shapiro
df3 = df2[df2['Stationid'].isin(df1['Stationid'])]
print (df3)
Stationid Maintanance
0 abc15 55
1 abc18 38 < group with length 1 (abc18)
3 abc15 56
6 abc15 56

df = (df3.groupby('Stationid')
.apply(lambda x: pd.Series(shapiro(x), index=['W','P']) if len(x) > 2
else pd.Series([np.nan, np.nan], index=['W','P']))
.reset_index())
print (df)
Stationid W P
0 abc15 0.75 -0.000001
1 abc18 NaN NaN

或者过滤掉这个组:

from scipy.stats import shapiro
df3 = df2[df2['Stationid'].isin(df1['Stationid'])]
print (df3)
Stationid Maintanance
0 abc15 55
1 abc18 38
3 abc15 56
6 abc15 56

df3 = df3[df3.groupby('Stationid')['Stationid'].transform('size') > 2]
print (df3)
Stationid Maintanance
0 abc15 55
3 abc15 56
6 abc15 56

df = (df3.groupby('Stationid')[['Maintanance']]
.apply(lambda x: pd.Series(shapiro(x), index=['W','P']))
.reset_index())
print (df)
Stationid W P
0 abc15 0.75 -0.000001

关于python - 如何使用 python pandas 找到 Shapiro-Wilk?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51928254/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com