gpt4 book ai didi

python - 按值分箱,最后一个箱除外

转载 作者:行者123 更新时间:2023-12-01 03:45:25 24 4
gpt4 key购买 nike

我正在尝试按如下方式对数据进行分类:

pd.cut(df['col'], np.arange(0,1.2, 0.2),include_lowest=True))

但我想确保任何大于 1 的数据也包含在最后一个 bin 中。我可以用几行来做这件事,但想知道是否有人知道这样做的单行/更Pythonic的方式?

PS - 我不想进行 qcut——我需要按值分隔箱,而不是按记录数分隔。

最佳答案

解决方案 1:准备 labels(使用 DF 的前 5 行)并在 1 参数中将 np.inf 替换为 bins:

In [67]: df
Out[67]:
a b c
0 1.698479 0.337989 0.002482
1 0.903344 1.830499 0.095253
2 0.152001 0.439870 0.270818
3 0.621822 0.124322 0.471747
4 0.534484 0.051634 0.854997
5 0.980915 1.065050 0.211227
6 0.809973 0.894893 0.093497
7 0.677761 0.333985 0.349353
8 1.491537 0.622429 1.456846
9 0.294025 1.286364 0.384152

In [68]: labels = pd.cut(df.a.head(), np.arange(0,1.2, 0.2), include_lowest=True).cat.categories

In [69]: pd.cut(df.a, np.append(np.arange(0, 1, 0.2), np.inf), labels=labels, include_lowest=True)
Out[69]:
0 (0.8, 1]
1 (0.8, 1]
2 [0, 0.2]
3 (0.6, 0.8]
4 (0.4, 0.6]
5 (0.8, 1]
6 (0.8, 1]
7 (0.6, 0.8]
8 (0.8, 1]
9 (0.2, 0.4]
Name: a, dtype: category
Categories (5, object): [[0, 0.2] < (0.2, 0.4] < (0.4, 0.6] < (0.6, 0.8] < (0.8, 1]]

说明:

In [72]: np.append(np.arange(0, 1, 0.2), np.inf)
Out[72]: array([ 0. , 0.2, 0.4, 0.6, 0.8, inf])

In [73]: labels
Out[73]: Index(['[0, 0.2]', '(0.2, 0.4]', '(0.4, 0.6]', '(0.6, 0.8]', '(0.8, 1]'], dtype='object')

解决方案 2: clip 所有值均大于 1

In [70]: pd.cut(df.a.clip(upper=1), np.arange(0,1.2, 0.2),include_lowest=True)
Out[70]:
0 (0.8, 1]
1 (0.8, 1]
2 [0, 0.2]
3 (0.6, 0.8]
4 (0.4, 0.6]
5 (0.8, 1]
6 (0.8, 1]
7 (0.6, 0.8]
8 (0.8, 1]
9 (0.2, 0.4]
Name: a, dtype: category
Categories (5, object): [[0, 0.2] < (0.2, 0.4] < (0.4, 0.6] < (0.6, 0.8] < (0.8, 1]]

说明:

In [75]: df.a
Out[75]:
0 1.698479
1 0.903344
2 0.152001
3 0.621822
4 0.534484
5 0.980915
6 0.809973
7 0.677761
8 1.491537
9 0.294025
Name: a, dtype: float64

In [76]: df.a.clip(upper=1)
Out[76]:
0 1.000000
1 0.903344
2 0.152001
3 0.621822
4 0.534484
5 0.980915
6 0.809973
7 0.677761
8 1.000000
9 0.294025
Name: a, dtype: float64

关于python - 按值分箱,最后一个箱除外,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39007172/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com