gpt4 book ai didi

python-3.x - Python - 计算数据框列的标准偏差(行级)

转载 作者:行者123 更新时间:2023-12-04 19:40:22 26 4
gpt4 key购买 nike

我创建了一个 Pandas 数据框,并且能够确定该数据框的一列或多列(列级别)的标准偏差。我需要确定特定列的所有行的标准偏差。以下是我迄今为止尝试过的命令

# Will determine the standard deviation of all the numerical columns by default.
inp_df.std()

salary 8.194421e-01
num_months 3.690081e+05
no_of_hours 2.518869e+02
# Same as above command. Performs the standard deviation at the column level.
inp_df.std(axis = 0)
# Determines the standard deviation over only the salary column of the dataframe.
inp_df[['salary']].std()

salary 8.194421e-01
# Determines Standard Deviation for every row present in the dataframe. But it
# does this for the entire row and it will output values in a single column.
# One std value for each row.
inp_df.std(axis=1)

0 4.374107e+12
1 4.377543e+12
2 4.374026e+12
3 4.374046e+12
4 4.374112e+12
5 4.373926e+12
当我执行以下命令时,所有记录都得到“NaN”。有没有办法解决这个问题?
# Trying to determine standard deviation only for the "salary" column at the
# row level.
inp_df[['salary']].std(axis = 1)

0 NaN
1 NaN
2 NaN
3 NaN
4 NaN

最佳答案

这是预期的,因为如果检查 DataFrame.std :

Normalized by N-1 by default. This can be changed using the ddof argument


如果您有一个元素,您就是在除以 0。因此,如果您有一个列并且想要跨列的样本标准差,请获取所有缺失值。
sample :
inp_df = pd.DataFrame({'salary':[10,20,30],
'num_months':[1,2,3],
'no_of_hours':[2,5,6]})
print (inp_df)
salary num_months no_of_hours
0 10 1 2
1 20 2 5
2 30 3 6
一栏一选 []Series :
print (inp_df['salary'])
0 10
1 20
2 30
Name: salary, dtype: int64
获取 stdSeries - 得到一个标量:
print (inp_df['salary'].std())
10.0
选择一列双 []one column DataFrame :
print (inp_df[['salary']])
salary
0 10
1 20
2 30
获取 stdDataFrame每个索引(默认值) - 获取一个元素 Series :
print (inp_df[['salary']].std())
#same like
#print (inp_df[['salary']].std(axis=0))
salary 10.0
dtype: float64
获取 stdDataFrame每列(轴 = 1) - 获取所有 NaN:
print (inp_df[['salary']].std(axis = 1))
0 NaN
1 NaN
2 NaN
dtype: float64
如果更改默认 ddof=1ddof=0 :
print (inp_df[['salary']].std(axis = 1, ddof=0))
0 0.0
1 0.0
2 0.0
dtype: float64
如果你想要 std按两列或更多列:
#select 2 columns
print (inp_df[['salary', 'num_months']])
salary num_months
0 10 1
1 20 2
2 30 3

#std by index
print (inp_df[['salary','num_months']].std())
salary 10.0
num_months 1.0
dtype: float64

#std by columns
print (inp_df[['salary','no_of_hours']].std(axis = 1))
0 5.656854
1 10.606602
2 16.970563
dtype: float64

关于python-3.x - Python - 计算数据框列的标准偏差(行级),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53809443/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com