gpt4 book ai didi

python - 基于分组数据的标签列

转载 作者:太空狗 更新时间:2023-10-30 02:06:55 25 4
gpt4 key购买 nike

我正在尝试创建一个由每个 ID 的唯一值组成的列(每个 ID 都有与之关联的许多行),如果该 ID 的标签已回答与其任何行关联,则所有与该 ID 关联的行应标记为已回答。如果与一个 id 关联的所有行都有一个未回答的标签,所有的行都应该被标记为未回答(这是目前发生的情况)

这是我写的代码:

将 numpy 导入为 np

conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)

data__id call_status rank
1 answered 1
1 not_answered 2
1 answered 3
2 not_answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 not_answered 2
5 not_answered 1
5 not_answered 2

在这种情况下,期望的结果是

   data__id   call_status       rank
1 answered 1
1 answered 2
1 answered 3
2 answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 answered 2
5 not_answered 1
5 not_answered 2

最佳答案

使用GroupBy.transformGroupBy.any每组至少测试一个 answered 并通过 DataFrame.loc 设置值:

mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any')

或者通过另一列筛选所有 data__id 并通过 Series.isin 测试成员资格:

mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())

df.loc[mask, 'call_status'] = 'answered'
print (df)
data__id call_status rank
0 1 answered 1
1 1 answered 2
2 1 answered 3
3 2 answered 1
4 2 answered 2
5 3 not_answered 1
6 4 answered 1
7 4 answered 2
8 5 not_answered 1
9 5 not_answered 2

关于python - 基于分组数据的标签列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57092032/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com