gpt4 book ai didi

python - 将条件 COUNTIF 应用于 pandas 数据框会导致 NaN

转载 作者:行者123 更新时间:2023-12-01 00:20:25 25 4
gpt4 key购买 nike

如果这是重复的,请将我链接到重复的内容。我没有找到任何其他帖子可以回答我的问题。

我有一个数据框,knn_res,具有以下尺寸和数据:

            username  Prediction  is_bot
0 megliebsch 1 0 1 megliebsch 1 0
2 megliebsch 1 0
3 megliebsch 1 0
4 megliebsch 1 0
... ... ... ...
1220 ARTHCLAUDIA 1 1
1221 ARTHCLAUDIA 1 1 1222 ARTHCLAUDIA 1 1
1223 ARTHCLAUDIA 1 1
1224 ASSUNCAOWALLAS 1 1

[1225 rows x 3 columns]

我想要做的是,对于每个用户名,计算 prediction = 1prediction = 0 的预测数量,并用这些创建两个新列值(value)观。例如使用以下数据集:

| username | prediction | is_bot |
|:--------:|:----------:|:------:|
| foo | 1 | 1 |
| foo | 1 | 1 |
| foo | 1 | 1 |
| foo | 0 | 1 |
| foo | 0 | 1 |
| foo1 | 0 | 1 |
| foo1 | 0 | 1 |
| foo1 | 0 | 0 |
| foo1 | 0 | 0 |
| foo1 | 1 | 0 |
| foo1 | 1 | 0 |
| foo1 | 0 | 0 |
| foo2 | 1 | 0 |
| foo2 | 1 | 0 |
| foo2 | 1 | 1 |

我想要:

| username | count_bot  | count_human |
|:--------:|:----------:|:-----------:|
| foo | 3 | 2 |
| foo1 | 2 | 5 |
| foo2 | 3 | 0 |

以下逻辑适用的情况:

For each row, if Prediction == 1, then increase the count_bot counter. If Prediction == 0, then increase the count_human counter. Then, append the totals for each row and group by.

到目前为止,我已尝试引用 this question并尝试了以下方法:

knn_res['count_bot'] = knn_res[knn_res.Prediction == 1].count()
print(knn_res)

其产量:

            username  Prediction  is_bot  count_bot
0 megliebsch 1 0 NaN
1 megliebsch 1 0 NaN
2 megliebsch 1 0 NaN
3 megliebsch 1 0 NaN
4 megliebsch 1 0 NaN
... ... ... ... ...
1220 ARTHCLAUDIA 1 1 NaN
1221 ARTHCLAUDIA 1 1 NaN
1222 ARTHCLAUDIA 1 1 NaN
1223 ARTHCLAUDIA 1 1 NaN
1224 ASSUNCAOWALLAS 1 1 NaN

尝试:

new = knn_res.groupby('username').sum()
print(new)

产量:

                 Prediction  is_bot
username
666STEVEROGERS 25 25
ADELE_BROCK 1 25
ADRIANAMFTTT 24 25
AHMADRADJAB 1 25
ALBERTA_HAYNESS 24 25
ALTMANBELINDA 23 25
ALVA_MC_GHEE 25 25
ANGELITHSS 25 25
ANN1EMCCONNELL 25 25
ANWARJAMIL22 25 25
AN_N_GASTON 25 25
ARONHOLDEN8 25 25
ARTHCLAUDIA 25 25
ASSUNCAOWALLAS 1 1
BECCYWILL 9 25
BELOZEROVNIKIT 17 25
BEN_SAR_GENT 1 25
BERT_HENLEY 24 25
BISHOLORINE 25 25
BLACKERTHEBERR5 11 25
BLACKTIVISTSUS 7 25
BLACK_ELEVATION 24 25
BOGDANOVAO2 7 25
BREMENBOTE 10 25
B_stever96 1 0
CALIFRONIAREP 24 25
C_dos_94 25 24
Cassidygirly 25 0
ChuckSpeaks_ 25 0
Cyabooty 0 0
DurkinSays 1 0
LSU_studyabroad 24 0
MisMonWEXP 0 0
NextLevel_Mel 25 0
PeterDuca 24 0
ShellMarcel 25 0
Sir_Fried_Alott 25 0
XavierRivera_ 0 0
ZacharyFlair 0 0
brentvarney44 1 0
cbars68 0 0
chloeschultz11 25 0
hoang_le_96 1 0
kdougherty178 25 0
lasallephilo 0 0
lovely_cunt_ 1 0
megliebsch 24 0
msimps_15 24 0
okweightlossdna 24 0
tankthe_hank 24 0

为了达到我想要的结果,我做错了什么?

最佳答案

username 分组和prediction分隔列 username 的相同值和prediction给团体。 prediction 0prediction 1每个username将被分为不同的组。调用count在每个组上(注意:我在 is_bot 之前将 prediction 更改为 count,因为这是您想要的)。最后,unstack放置01到列和 rename他们如你所愿

df_out = (df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[30]:
prediction count_human count_bot
username
foo 2 3
foo1 5 2
foo2 0 3
<小时/>

一步一步:

groupby 每组 usernameprediction并依赖每组0 , 1每个username

df.groupby(['username', 'prediction']).prediction.count()

Out[32]:
username prediction
foo 0 2
1 3
foo1 0 5
1 2
foo2 1 3
Name: prediction, dtype: int64

取消堆叠以放置索引 prediction到列

df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0)

Out[33]:
prediction 0 1
username
foo 2 3
foo1 5 2
foo2 0 3

最后,重命名列以匹配您所需的输出

(df.groupby(['username', 'prediction']).prediction.count().unstack(fill_value=0).
rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

Out[34]:
prediction count_human count_bot
username
foo 2 3
foo1 5 2
foo2 0 3

关于python - 将条件 COUNTIF 应用于 pandas 数据框会导致 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59004187/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com