gpt4 book ai didi

Pandas 将切割中的列添加到 DataFrame

转载 作者:行者123 更新时间:2023-12-01 13:39:09 25 4
gpt4 key购买 nike

我需要在 DataFramecut 上记录 cut(子箱)。

如果每个 cut 的子 bin 边界都相同,这将非常简单。例如,

df = pd.DataFrame({'A':np.random.random(100), 'B':np.random.random(100)})
# Primary bins: quintiles on column A
df['P'] = pd.qcut(df['A'], 5, labels=range(1,6)).astype(int)
# Secondary bins: quartiles on column B
df['Q'] = df.groupby(['P'])['B'].transform(lambda x: pd.qcut(x, 4, labels=range(1,5)))

但是,当 cut 边界时,我不知道如何使用转换函数,甚至不知道如何将第二个 cut 值返回到 DataFrame 中每个主要 cut 都不同。例如,

subBinBounds = [[0, .1, .5, 1],[0, .3, .6, 1],[0, .2, .7, 1],[0, .4, .6, 1][0, .2, .5, 1]]
for i in range(5):
cut = df[df['P'] == i+1] # P is in {1, 5}
subbin = pd.cut(cut['B'], subBinBounds[i], labels=range(1,4))
cut['Q'] = cut.assign(Q=subbin.values)
# But how do we get 'Q' back into df?

最佳答案

您可以 concat subseries 循环附加到 sers - list of Series

#for testing - get same output of random functions
np.random.seed(100)
df = pd.DataFrame({'A':np.random.random(100), 'B':np.random.random(100)})
# Primary bins: quintiles on column A
df['P'] = pd.qcut(df['A'], 5, labels=range(1,6)).astype(int)

sers = []
subBinBounds = [[0, .1, .5, 1],[0, .3, .6, 1],[0, .2, .7, 1],[0, .4, .6, 1], [0, .2, .5, 1]]
for i in range(5):
cut = df[df['P'] == i+1]
subbin = pd.cut(cut['B'], subBinBounds[i], labels=range(1,4))
sers.append(subbin)

df['Q'] = pd.concat(sers)
print (df.head(10))
A B P Q
0 0.543405 0.778289 3 3
1 0.278369 0.779598 2 3
2 0.424518 0.610328 3 2
3 0.844776 0.309000 5 2
4 0.004719 0.697735 1 3
5 0.121569 0.859618 1 3
6 0.670749 0.625324 4 3
7 0.825853 0.982408 5 3
8 0.136707 0.976500 1 3
9 0.575093 0.166694 3 1

关于Pandas 将切割中的列添加到 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41750616/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com