gpt4 book ai didi

python - 如何根据阈值对多列进行分组并在Python中创建新列

转载 作者:行者123 更新时间:2023-12-03 23:27:39 25 4
gpt4 key购买 nike

我有如下所示的数据框

输入

Invoice No  Date    Text            Vendor    Days
1000001 1/1/2020 Rent Payment A 0
1000003 2/1/2020 Rent Payment A 1
1000005 4/1/2020 Rent Payment A 2
1000007 6/1/2020 Water payment A 2
1000008 9/2/2020 Rep Payment A 34
1000010 9/2/2020 Car Payment A 0
1000011 10/2/2020 Car Payment A 1
1000012 15/2/2020 Car Payment A 5
1000013 16/2/2020 Car Payment A 1
1000015 17/2/2020 Car Payment A 1
1000002 1/1/2020 Rent Payment B -47
1000004 4/1/2020 Con Payment B 3
1000006 6/1/2020 Con Payment B 2
1000009 9/2/2020 Water payment B 34
1000014 17/2/2020 Test Payment B 8
1000016 19/2/2020 Test Payment B 2

条件

如何编写 python 条件来检查描述、供应商名称和天数列,如果描述、供应商名称相同并且天数 <=2,那么这些行应该在通用组名下分组在一起,比如 (G1) 所有的其他行可以分配一个唯一的组名。所有分组的行都应该有唯一的组名,如输出所示

预期输出
Invoice No  Date        Text          Vendor   Days    Group
1000001 1/1/2020 Rent Payment A 0 G1
1000003 2/1/2020 Rent Payment A 1 G1
1000005 4/1/2020 Rent Payment A 2 G1
1000007 6/1/2020 Water payment A 2 G2
1000008 9/2/2020 Rep Payment A 34 G3
1000010 9/2/2020 Car Payment A 0 G4
1000011 10/2/2020 Car Payment A 1 G4
1000012 15/2/2020 Car Payment A 5 G5
1000013 16/2/2020 Car Payment A 1 G5
1000015 17/2/2020 Car Payment A 1 G5
1000002 1/1/2020 Rent Payment B -47 G6
1000004 4/1/2020 Con Payment B 3 G7
1000006 6/1/2020 Con Payment B 2 G7
1000009 9/2/2020 Water payment B 34 G8
1000014 17/2/2020 Test Payment B 8 G9
1000016 19/2/2020 Test Payment B 2 G9

最佳答案

您需要在三个项目上使用 groupby : 'Text''Vendor' ,以及 'Days' 在单独由 2 定义的组内变化是否超过 ['Text', 'Vendor'] 的 bool 表示。

之后,您需要命名唯一的组。我在下面提供了两种方法。
ngroup

f = lambda x: x.diff().fillna(0).gt(2).cumsum()
d = df.groupby(['Text', 'Vendor']).Days.transform(f)
g = df.groupby(['Text', 'Vendor', d], sort=False).ngroup()
df.assign(Group=g.add(1).astype(str).radd('G'))

Invoice No Date Text Vendor Days Group
0 1000001 1/1/2020 Rent Payment A 0 G1
1 1000003 2/1/2020 Rent Payment A 1 G1
2 1000005 4/1/2020 Rent Payment A 2 G1
3 1000007 6/1/2020 Water payment A 2 G2
4 1000008 9/2/2020 Rep Payment A 34 G3
5 1000010 9/2/2020 Car Payment A 0 G4
6 1000011 10/2/2020 Car Payment A 1 G4
7 1000012 15/2/2020 Car Payment A 5 G5
8 1000013 16/2/2020 Car Payment A 1 G5
9 1000015 17/2/2020 Car Payment A 1 G5
10 1000002 1/1/2020 Rent Payment B -47 G6
11 1000004 4/1/2020 Con Payment B 3 G7
12 1000006 6/1/2020 Con Payment B 2 G7
13 1000009 9/2/2020 Water payment B 34 G8
14 1000014 17/2/2020 Test Payment B 8 G9
15 1000016 19/2/2020 Test Payment B 2 G9
factorize
f = lambda x: x.diff().fillna(0).gt(2).cumsum()
d = df.groupby(['Text', 'Vendor']).Days.transform(f)
g = pd.factorize([*zip(df.Text, df.Vendor, d)])[0]
df.assign(Group=[f'G{i + 1}' for i in g])

Invoice No Date Text Vendor Days Group
0 1000001 1/1/2020 Rent Payment A 0 G1
1 1000003 2/1/2020 Rent Payment A 1 G1
2 1000005 4/1/2020 Rent Payment A 2 G1
3 1000007 6/1/2020 Water payment A 2 G2
4 1000008 9/2/2020 Rep Payment A 34 G3
5 1000010 9/2/2020 Car Payment A 0 G4
6 1000011 10/2/2020 Car Payment A 1 G4
7 1000012 15/2/2020 Car Payment A 5 G5
8 1000013 16/2/2020 Car Payment A 1 G5
9 1000015 17/2/2020 Car Payment A 1 G5
10 1000002 1/1/2020 Rent Payment B -47 G6
11 1000004 4/1/2020 Con Payment B 3 G7
12 1000006 6/1/2020 Con Payment B 2 G7
13 1000009 9/2/2020 Water payment B 34 G8
14 1000014 17/2/2020 Test Payment B 8 G9
15 1000016 19/2/2020 Test Payment B 2 G9

一些细节
#        The first element of group    Cumulatively summing True/False
# will get NaN so we fill it will create a new value every time
# in with 0 ║ we see a True. This creates groups
# ║ ║
# adjacent differences Should be obvious
# ╭─┴──╮ ╭───╨───╮ ╭─┴─╮ ╭───╨──╮
f = lambda x: x.diff().fillna(0).gt(2).cumsum()

关于python - 如何根据阈值对多列进行分组并在Python中创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60657028/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com