gpt4 book ai didi

python - 从聚合数据帧创建新数据帧

转载 作者:行者123 更新时间:2023-12-01 04:17:07 24 4
gpt4 key购买 nike

我有一个数据框,它按位置聚合人员,如下所示

location_id | score | number_of_males | number_of_females
1 | 20 | 2 | 1
2 | 45 | 1 | 2

我想创建一个新的数据框来取消聚合这个数据框,这样我就会得到类似的内容

location_id | score | number_of_males | number_of_females
1 | 20 | 1 | 0
1 | 20 | 1 | 0
1 | 20 | 0 | 1
2 | 45 | 1 | 0
2 | 45 | 0 | 1
2 | 45 | 0 | 0

或者更好

location_id | score |       sex 
1 | 20 | male
1 | 20 | male
1 | 20 | female
2 | 45 | male
2 | 45 | female
2 | 45 | female

我想做类似的事情

import pandas as pd
aggregated_df = pd.DataFrame.from_csv(SOME_PATH)
unaggregated_df = df = pd.DataFrame(columns=['location_id', 'score', 'sex'])

for row in aggregated_df:
for column in ['number_of_males', 'number_of_females']:
for number_of_people in range(0, row[column]):
if column == 'number_of_males':
sex = 'male'
else:
sex = 'female'
unaggregated_df.append([{'location_id': row['location_id'],
'score': row['score'],
'sex': sex}],
ignore_index=True)

我在附加字典时遇到困难,尽管 pandas 似乎支持这一点

是否有更pandthonic( Pandas 版本的pythonic)方法来完成此任务?

最佳答案

以下是使用 group_by 获取结果的方法:

ids = ['location_id','score']

def foo(d):
return pd.Series(d['number_of_males'].values*['male'] +
d['number_of_females'].values*['female'])

pd.melt(df.groupby(ids).apply(foo).reset_index(), id_vars=ids).drop('variable', 1)

#Out[13]:
# location_id score value
#0 1 20 male
#1 2 45 male
#2 1 20 male
#3 2 45 female
#4 1 20 female
#5 2 45 female

关于python - 从聚合数据帧创建新数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34195014/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com