gpt4 book ai didi

python - 将特定列值转换为标题并将标题转换为列值

转载 作者:太空宇宙 更新时间:2023-11-04 11:19:56 25 4
gpt4 key购买 nike

我有这样的 csv,

Market_id   Metric  New_Mar_19  New_Feb_19  tot_Mar_19  tot_Feb_19  
7528 1 9 7 6.5 8
7528 2 6 3.2 2.2 3

我想将上面的 csv 转换成下面这样,

  Month         Nrx_count       Nrx_quantity    Trx_count       Trx_quantity

Mar_19 9 6 6.5 2.2
Feb_19 7 3.2 8 3

我尝试过使用 pivot 和 pd.melt,但我不喜欢上面的内容。

最佳答案

想法是将没有 _ 的列转换为 MultiIndex,然后按 _ 拆分列并按 DataFrame.stack reshape , 然后通过 DataFrame.unstack :

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
New tot
Market_id Metric
7528 1 Feb_19 7.0 8.0
Mar_19 9.0 6.5
2 Feb_19 3.2 3.0
Mar_19 6.0 2.2

#second level, here Metric is converted to columns
df1 = df1.unstack(1)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Market_id y New_1 New_2 tot_1 tot_2
0 7528 Feb_19 7.0 3.2 8.0 3.0
1 7528 Mar_19 9.0 6.0 6.5 2.2

#first level, here Market_id is converted to columns
df1 = df1.unstack(0)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Metric y New_7528 tot_7528
0 1 Feb_19 7.0 8.0
1 1 Mar_19 9.0 6.5
2 2 Feb_19 3.2 3.0
3 2 Mar_19 6.0 2.2

编辑:

如果报错:

ValueError: Index contains duplicate entries, cannot reshape

表示有重复。解决方案是相同的,只需要添加聚合 - 例如通过平均总和...

#change data for duplicates in column for Multiindex - Market_id, Metric
print (df)
Market_id Metric New_Mar_19 New_Feb_19 tot_Mar_19 tot_Feb_19
0 7528 1 2 5.0 5.0 8
1 7528 1 10 20.0 2.5 16
2 7528 2 6 3.2 2.2 3

df1 = df.set_index(['Market_id','Metric'])
df1.columns = df1.columns.str.split('_', n=1, expand=True)
df1 = df1.stack()
print (df1)
New tot
Market_id Metric
7528 1 Feb_19 5.0 8.0
Mar_19 2.0 5.0
Feb_19 20.0 16.0
Mar_19 10.0 2.5
2 Feb_19 3.2 3.0
Mar_19 6.0 2.2

#number of levels passes to groupby - here 3 levels -> 0,1,2
print (df1.index.nlevels)
3

#grouping by all levels and aggregate mean, sum,max...
df1 = df1.groupby(level=[0,1,2]).mean().unstack(1)
print (df1)
New tot
Metric 1 2 1 2
Market_id
7528 Feb_19 12.5 3.2 12.00 3.0
Mar_19 6.0 6.0 3.75 2.2

df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index().rename(columns={'level_1':'y'})
print (df1)
Market_id y New_1 New_2 tot_1 tot_2
0 7528 Feb_19 12.5 3.2 12.00 3.0
1 7528 Mar_19 6.0 6.0 3.75 2.2

关于python - 将特定列值转换为标题并将标题转换为列值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56239660/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com