gpt4 book ai didi

python - Pandas如何通过在每列的任何行中获取值='yes'来组合行分组名称

转载 作者:太空宇宙 更新时间:2023-11-03 14:57:39 28 4
gpt4 key购买 nike

我需要将行与列“name”合并,并且该表在某些行中具有不同的列,其值为“yes”,如下所示,

下面的模板给出了输入和预期输出:

name    department  feature1    feature2    feature3
x1 cs yes yes
x1 cs yes
x1 ec
x2 cs yes yes
x2 ec yes

我需要得到的输出是:

x1         cs        yes            yes       yes
x1 ec
x2 cs yes yes
x2 ec yes

建议请使用 python 和 pandas。

最佳答案

您可以使用:

#if want filter only `yes` values 
cols = df.columns.difference(['name','department'])
df[cols] = df[cols] == 'yes'
print (df)
name department feature1 feature2 feature3
0 x1 cs False True True
1 x1 cs True False False
2 x1 ec False False False
3 x2 cs True True False
4 x2 ec False True False

然后groupby与聚合GroupBy.max最后replace通过字典:

df= df.groupby(['name','department']) \
.max() \
.replace({True:'yes',False:np.nan}) \
.reset_index()

print (df)
name department feature1 feature2 feature3
0 x1 cs yes yes yes
1 x1 ec NaN NaN NaN
2 x2 cs yes yes NaN
3 x2 ec NaN yes NaN

感谢您的评论,AChampion ,也可以使用 DataFrameGroupBy.any :

df= df.groupby(['name','department']) \
.any() \
.replace({True:'yes',False:np.nan}) \
.reset_index()

print (df)
name department feature1 feature2 feature3
0 x1 cs yes yes yes
1 x1 ec NaN NaN NaN
2 x2 cs yes yes NaN
3 x2 ec NaN yes NaN

如果所有值仅是 yesNaN 也有效:

df = df.fillna('').groupby(['name', 'department']).max().reset_index()
print (df)
name department feature1 feature2 feature3
0 x1 cs yes yes yes
1 x1 ec
2 x2 cs yes yes
3 x2 ec yes

编辑:

您可以使用聚合函数通过dict理解创建自定义dict并使用DataFrameGroupBy.agg :

d = {'feature3': ['yes', np.nan, np.nan, np.nan, np.nan], 
'feature2': ['yes', np.nan, np.nan, 'yes', 'yes'],
'name': ['x1', 'x1', 'x1', 'x2', 'x2'],
'count': [10.0, 30.0, np.nan, 20.0, 3.0],
'feature1': [np.nan, 'yes', np.nan, 'yes', np.nan],
'department': ['cs', 'cs', 'ec', 'cs', 'ec'],
'description': ['xsdepartment1', 'xsdepartment2', np.nan, 'department1', 'department3']}

c = ['name','department','feature1','feature2','feature3','count','description']
df = pd.DataFrame(d, columns = c)
print (df)
name department feature1 feature2 feature3 count description
0 x1 cs NaN yes yes 10.0 xsdepartment1
1 x1 cs yes NaN NaN 30.0 xsdepartment2
2 x1 ec NaN NaN NaN NaN NaN
3 x2 cs yes yes NaN 20.0 department1
4 x2 ec NaN yes NaN 3.0 department3

cols = df.columns.difference(['name','department','count','description'])

f = lambda x: tuple(x)
d = {x:'max' for x in cols}
d['count'] = f
d['description'] = f
print (d)
{'feature3': 'max',
'feature1': 'max',
'feature2': 'max',
'description': <function <lambda> at 0x000000000F6FC598>,
'count': <function <lambda> at 0x000000000F6FC598>}
<小时/>
df[cols] = df[cols] == 'yes'
print (df)
name department feature1 feature2 feature3 count description
0 x1 cs False True True 10.0 xsdepartment1
1 x1 cs True False False 30.0 xsdepartment2
2 x1 ec False False False NaN NaN
3 x2 cs True True False 20.0 department1
4 x2 ec False True False 3.0 department3

df = df.groupby(['name', 'department']).agg(d).reset_index()
df[cols] = df[cols].replace({True:'yes',False:np.nan})
print (df)
name department feature3 feature1 feature2 description \
0 x1 cs yes yes yes (xsdepartment1, xsdepartment2)
1 x1 ec NaN NaN NaN (nan,)
2 x2 cs NaN yes yes (department1,)
3 x2 ec NaN NaN yes (department3,)

count
0 (10.0, 30.0)
1 (nan,)
2 (20.0,)
3 (3.0,)

关于python - Pandas如何通过在每列的任何行中获取值='yes'来组合行分组名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45396562/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com