gpt4 book ai didi

python - 仅使用相关列取消堆叠数据框

转载 作者:太空宇宙 更新时间:2023-11-03 16:34:17 25 4
gpt4 key购买 nike

我有以下数据框:

data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
'vat': [0.5, 0.5, 0.8, 0.6, 0.1, 0.5, 0.10, 0.6, 0.12, 0.11]}

stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount', 'vat'])
stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
mask = pd.IndexSlice['amount', :]
df.loc[:, mask] = df.loc[:, mask].fillna(0)

我得到以下输出:

                                amount           vat           
product a b c a b c
retailer_name store_number year
CRV 1946 2011 8 0 0 0.80 NaN NaN
1947 2012 6 0 0 0.60 NaN NaN
2013 0 0 11 NaN NaN 0.11
1948 2011 6 1 0 0.60 0.1 NaN
1949 2012 12 0 0 0.12 NaN NaN
Walmart 1944 2010 5 0 0 0.50 NaN NaN
1945 2010 0 5 0 NaN 0.5 NaN
1947 2010 0 10 0 NaN 0.1 NaN
1949 2012 5 0 0 0.50 NaN NaN

我的最终结果中不需要这些 vat 列,如何从我的 unstack 中删除它们?

最佳答案

对我来说有效:

df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')

df = df['amount'].fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0

一起:

df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')['amount'].fillna(0)
print (df)

product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0

另一个解决方案是选择 sumamount 列:

df = stores.groupby(level=[0, 1, 2, 3])['amount'].sum().unstack('product').fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0

关于python - 仅使用相关列取消堆叠数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37342414/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com