gpt4 book ai didi

python - 如何在基于两列的数据框中堆叠聚合信息?

转载 作者:行者123 更新时间:2023-12-01 06:27:33 25 4
gpt4 key购买 nike

我有一个数据框,其中有一列,其中的行我想成为自己的列,并用另一列的数据填充每一行。

我的起始数据框如下:

data = {'key':  ['AAAA-27293', 'AAAA-27293','AAAA-27293','AAAA-27293','AAAA-27293','AAAA-27293','AAAA-27293', 'BBBBB-27296','BBBBB-27296','BBBBB-27296','BBBBB-27296','BBBBB-27296','BBBBB-27296','BBBBB-27296'],

'project_id': [ '105', '105','105','105','105','105','105', '107','107','107','107','107','107','107'],

'create_date': [ '2019-01-02', '2019-01-02','2019-01-02','2019-01-02','2019-01-02','2019-01-02','2019-01-02', '2019-01-16','2019-01-16','2019-01-16','2019-01-16','2019-01-16','2019-01-16','2019-01-16'],

'summary': ['Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email','Automated-email'],

'description': [ 'Output', 'Output','Output','Output','Output','Output','Output','Output','Output','Output','Output','Output','Output','Output'],

'field': [ 'issue', 'message reciever','message sender','checker','resolution','source','status','issue', 'message reciever','message sender','checker','resolution','source','status'],

'field_value': ['task','johnsmith@yahoo','jim@gmail','None','rejected','ABC123','resolved', 'job','ian@yahoo','johnharris@aol','None','completed','ABC432','resolved'],
}

df = pd.DataFrame(data,columns=['key','project_id','create_date','summary','description','field','field_value'])

除了列“field”和“field_value”之外,您将看到每列中的所有行值都相同。 “field”列有 7 个唯一值,我希望它们成为它们自己的列,并且每一行都应使用“field_value”中的值填充。

我希望达到的结果是:

data2 = {'key':  ['AAAA-27293', 'BBBBB-27296'],

'project_id': [ '105', '107'],

'create_date': [ '2019-01-02','2019-01-16'],

'summary': ['Automated-email','Automated-email'],

'description': [ 'Output','Output'],

'issue': ['task','job'],

'message reciever': ['johnsmith@yahoo','ian@yahoo'],

'message sender': ['jim@gmail','johnharris@aol'],

'checker': ['None','None'],

'resolution': ['rejected','completed'],

'source': ['ABC123','ABC432'],

'staus': ['resolved', 'resolved']
}

df2 = pd.DataFrame(data2,columns=['key','project_id','create_date','summary','description','issue','message reciever','message sender','checker','resolution','source','status'])

我尝试了下面的代码,但出现了错误

df.set_index(['key','project_id','create_date','summary','description','field','field_value'],drop=True).unstack('field_value')

最佳答案

尝试一下,不要将 field_value 添加到索引中,并使用不带参数的 unstack 来取消堆栈最中间的索引级别,即“field”:

df.set_index(['key','project_id',
'create_date','summary',
'description', 'field'])['field_value'].unstack().reset_index()

输出:

|    | key         |   project_id | create_date   | summary         | description   | checker   | issue   | message reciever   | message sender   | resolution   | source   | status   |
|---:|:------------|-------------:|:--------------|:----------------|:--------------|:----------|:--------|:-------------------|:-----------------|:-------------|:---------|:---------|
| 0 | AAAA-27293 | 105 | 2019-01-02 | Automated-email | Output | None | task | johnsmith@yahoo | jim@gmail | rejected | ABC123 | resolved |
| 1 | BBBBB-27296 | 107 | 2019-01-16 | Automated-email | Output | None | job | ian@yahoo | johnharris@aol | completed | ABC432 | resolved |

关于python - 如何在基于两列的数据框中堆叠聚合信息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60060976/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com