gpt4 book ai didi

python - 将 Pandas DataFrame 转换为 bin 频率

转载 作者:行者123 更新时间:2023-12-01 05:15:29 24 4
gpt4 key购买 nike

使用 pandas,我知道如何对单个列进行分箱,但我正在努力弄清楚如何对多个列进行分箱,然后找到分箱的计数(频率),因为我的数据框有 20 列。我知道我可以对单列使用的方法执行 20 次,但我有兴趣学习一种新的更好的方法。这是数据框的头部,显示 4 列:

      Percentile1 Percentile2 Percentile3   Percentile4
395 0.166667 0.266667 0.266667 0.133333
424 0.266667 0.266667 0.133333 0.032258
511 0.032258 0.129032 0.129032 0.387097
540 0.129032 0.129032 0.387097 0.612903
570 0.129032 0.387097 0.612903 0.741935

我创建了以下 bin 数组

output = ['0-10','10-20','20-30','30-40','40-50','50-60','60-70','70-80','80-90','90-100']

这是我想要的输出:

      Percentile1 Percentile2 Percentile3   Percentile4
395 10-20 20-30 20-30 10-20
424 20-30 20-30 10-20 0-10
511 0-10 10-20 10-20 30-40
540 10-20 10-20 30-40 60-70
570 10-20 30-40 60-70 70-80

在此之后,我理想地会进行频率/值计数以获得如下内容:

      Percentile1 Percentile2 Percentile3   Percentile4
0-10 frequency #'s
10-20
20-30
30-40
40-50
etc...

任何帮助将不胜感激

最佳答案

我可能会做如下的事情:

print df

Percentile1 Percentile2 Percentile3 Percentile4
0 0.166667 0.266667 0.266667 0.133333
1 0.266667 0.266667 0.133333 0.032258
2 0.032258 0.129032 0.129032 0.387097
3 0.129032 0.129032 0.387097 0.612903
4 0.129032 0.387097 0.612903 0.741935

现在使用 applycut 创建一个新的数据框,用其所在的十分位 bin 替换百分位数(apply 会迭代每一列):

bins = xrange(0,110,10)
new = df.apply(lambda x: pd.Series(pd.cut(x*100,bins)))
print new

Percentile1 Percentile2 Percentile3 Percentile4
0 (10, 20] (20, 30] (20, 30] (10, 20]
1 (20, 30] (20, 30] (10, 20] (0, 10]
2 (0, 10] (10, 20] (10, 20] (30, 40]
3 (10, 20] (10, 20] (30, 40] (60, 70]
4 (10, 20] (30, 40] (60, 70] (70, 80]

再次使用 apply 来获取频率计数:

print new.apply(lambda x: x.value_counts()/x.count())

Percentile1 Percentile2 Percentile3 Percentile4
(0, 10] 0.2 NaN NaN 0.2
(10, 20] 0.6 0.4 0.4 0.2
(20, 30] 0.2 0.4 0.2 NaN
(30, 40] NaN 0.2 0.2 0.2
(60, 70] NaN NaN 0.2 0.2
(70, 80] NaN NaN NaN 0.2

或值计数:

print new.apply(lambda x: x.value_counts())

Percentile1 Percentile2 Percentile3 Percentile4
(0, 10] 1 NaN NaN 1
(10, 20] 3 2 2 1
(20, 30] 1 2 1 NaN
(30, 40] NaN 1 1 1
(60, 70] NaN NaN 1 1
(70, 80] NaN NaN NaN 1

另一种方法是不创建中间数据帧(我称之为new),而是直接在一个命令中进行值计数:

print df.apply(lambda x: pd.value_counts(pd.cut(x*100,bins)))

Percentile1 Percentile2 Percentile3 Percentile4
(0, 10] 1 NaN NaN 1
(10, 20] 3 2 2 1
(20, 30] 1 2 1 NaN
(30, 40] NaN 1 1 1
(60, 70] NaN NaN 1 1
(70, 80] NaN NaN NaN 1

关于python - 将 Pandas DataFrame 转换为 bin 频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23304733/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com