gpt4 book ai didi

python - Pandas 中复杂的拆分、合并和透视多个数据框

转载 作者:太空宇宙 更新时间:2023-11-04 02:28:18 25 4
gpt4 key购买 nike

我有两个 pandas 数据框,它们必须合并和转换。在其中一个数据框中,一列是字符串和逗号分隔的。数据框是

import pandas as pd
import numpy as np

tableA = [(100, 'chocolate, sprinkles'),
(101, 'chocolate, sprinkles'),
(102, 'glazed')]
labels = ['product', 'tags']
dfA = pd.DataFrame.from_records(tableA, columns=labels)

tableB = [('A', 100),
('A', 101),
('B', 101),
('C', 100),
('C', 102),
('B', 101),
('A', 100),
('C', 102)]
labels = ['customer', 'product']
dfB = pd.DataFrame.from_records(tableB, columns=labels)

dfA:
product tags
0 100 chocolate, sprinkles
1 101 chocolate, sprinkles
2 102 glazed
dfB:
customer product
0 A 100
1 A 101
2 B 101
3 C 100
4 C 102
5 B 101
6 A 100
7 C 102

结果一定是这样的

 customer   sprinkles   chocolate   glazed
A ? ? ?
B ? ? ?
C ? ? ?

我尝试了各种功能,但都失败了。任何建议将不胜感激!

我的一些代码,我知道这行不通,但它应该让您了解我尝试做的事情:

dfC=dfB.merge(dfA, left_on='product', right_on='product')
print(dfC)

导致

        customer  product                  tags
0 A 100 chocolate, sprinkles
1 C 100 chocolate, sprinkles
2 A 100 chocolate, sprinkles
3 A 101 chocolate, sprinkles
4 B 101 chocolate, sprinkles
5 B 101 chocolate, sprinkles
6 C 102 glazed
7 C 102 glazed

和,

dfS = pd.DataFrame(dfC.tags.str.split(',').tolist(),index=dfC.customer).stack()
dfS = dfS.reset_index()[[ 'customer',0]]
dfS.columns = ['var1', 'var2']
print(dfS)

导致:

     var1        var2
0 A chocolate
1 A sprinkles
2 C chocolate
3 C sprinkles
4 A chocolate
5 A sprinkles
6 A chocolate
7 A sprinkles
8 B chocolate
9 B sprinkles
10 B chocolate
11 B sprinkles
12 C glazed
13 C glazed

最佳答案

使用组合数据框 dfs,您可以使用 pd.crosstab 获取客户标签使用次数

pd.crosstab(dfs.var1,dfs.var2)

var2 chocolate glazed sprinkles
var1
A 3 0 3
B 2 0 2
C 1 2 1

关于python - Pandas 中复杂的拆分、合并和透视多个数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49822030/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com