gpt4 book ai didi

python - pandas 计算多列的汇总 DataFrame

转载 作者:行者123 更新时间:2023-12-01 02:06:57 24 4
gpt4 key购买 nike

我有一个包含各种产品、位置、Licence_ends 的文件,我需要计算按日期有多少产品已用完许可证,以及该季度有多少产品可以重新订购,示例数据如下:

   Item    Store    Category    Licence_ends    Available_to_reorder
0 A01929 North Office 2018 Q1 Yes
1 A02911 South Windows 2019 Q3 Yes
2 B11282 North Adobe 2019 Q2 No
3 C73162 East Office 2018 Q4 Yes
4 A12817 West Windows 2020 Q1 No

我想要实现的目标如下:

   Store    Category    2018 Q1 2018 Q2 ... 2020 Q4
0 East Windows 0 1 24 # cumulative sum of previous quarters
1 East Office 1 2 11
2 East Adobe 1 4 6
3 West Windows 2 2 18
4 West Office 0 0 0
...
11 South Adobe 1 0 12
12 Total All col.sum() col.sum() col.sum()

我从下面的代码开始,但我迷失了并且不知道正确的方法:

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from IPython.display import display
matplotlib.style.use('ggplot')

file = r'C:\Users\G01024822\Desktop\products.xlsx'
sheet = r'store_data'
data = pd.read_excel(file,sheetname=sheet,na_values='')
quarters = ['2018 Q1','2018 Q2','2018 Q3','2018 Q4','2019 Q1','2019 Q2','2019 Q3','2019 Q4','2020 Q1','2020 Q2','2020 Q3','2020 Q4']
categories = ['Windows','Office','Adobe']
stores = data['store'].unique().tolist()

mydata = {}
plandata = {}

for store in stores:
transaction = data[data['Store']==store]

for category in categories:
frame = transaction[transaction['Category']==category]
cycle = {}
if frame.shape[0] != 0:
for quarter in quarters:
temp = frame[frame['License_ends']==quarter]
out_of_licence = temp['Item'].count()
cycle[quarter] = out_of_licence



else:
pass

mydata[category] = cycle
df = pd.DataFrame.from_dict(mydata,orient='index')

df

这就是我正在生产的产品,但仅适用于最后一家商店:

enter image description here

分别针对每个类别。我尝试向空数据帧添加列表、系列、字典,我尝试追加、添加、分配,但没有得到我想要的。您能指出我正确的方向吗?

我在 SO 中尝试了大多数方法,还查看了 Wes Kinley 的书 @ Safari Books,但就是无法登陆。请帮忙。我必须在周一之前完成,但我完全无处可去。

最佳答案

考虑pivot_tableaggfunc 参数中使用 lambda 进行条件逻辑和。下面用随机数据进行演示,为可重复性提供种子,当然添加了开源类别。

数据

import numpy as np
import pandas as pd

np.random.seed(22)
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')

df = pd.DataFrame({'Item': ["".join(list(np.random.choice(LETTERS,1)) +
[str(np.random.randint(1000, 9000))]) for _ in range(500)],
'Store': [np.random.choice(['North', 'South',
'East', 'West'],1).item(0) for _ in range(500)],
'Category': [np.random.choice(['Office', 'Windows',
'Adobe', 'Open Source'],1).item(0) for _ in range(500)],
'Licence_ends': ["Q".join([str(np.random.randint(2018, 2021))] +
[str(np.random.randint(1,4))]) for _ in range(500)],
'Available_to_reorder': [np.random.choice(['Yes', 'No'],1).item(0) for _ in range(500)]},
columns = ['Item', 'Store', 'Category', 'Licence_ends', 'Available_to_reorder'])

print(df.head())
# Item Store Category Licence_ends Available_to_reorder
# 0 V7276 West Open Source 2018Q2 Yes
# 1 M8104 West Windows 2020Q1 No
# 2 E6478 North Open Source 2019Q2 No
# 3 W5587 South Open Source 2018Q2 Yes
# 4 U3952 South Windows 2019Q3 No
# 5 E1989 East Office 2018Q1 No
# 6 S6646 West Windows 2019Q2 Yes
# 7 N7616 West Adobe 2019Q1 Yes
# 8 H6410 East Adobe 2020Q2 No
# 9 J8176 West Office 2020Q1 Yes

数据透视表 (结果为多索引数据框)

pvt_df = df.pivot_table(index=['Store', 'Category'], columns='Licence_ends', values='Available_to_reorder', 
aggfunc = lambda x: sum(x=='Yes'), margins=True, margins_name='Total')

print(pvt_df)
# Licence_ends 2018Q1 2018Q2 2018Q3 2019Q1 2019Q2 2019Q3 2020Q1 2020Q2 2020Q3 Total
# Store Category
# East Adobe 3.0 0.0 1.0 0.0 3.0 2.0 1.0 4.0 0.0 14
# Office 1.0 3.0 4.0 2.0 NaN 4.0 1.0 1.0 1.0 17
# Open Source 1.0 4.0 2.0 0.0 1.0 0.0 1.0 2.0 1.0 12
# Windows 1.0 2.0 3.0 1.0 1.0 0.0 1.0 3.0 1.0 13
# North Adobe 3.0 4.0 1.0 1.0 1.0 1.0 3.0 0.0 2.0 16
# Office 1.0 0.0 3.0 0.0 1.0 2.0 3.0 0.0 0.0 10
# Open Source 3.0 1.0 0.0 1.0 1.0 2.0 2.0 1.0 2.0 13
# Windows 2.0 2.0 5.0 0.0 2.0 2.0 1.0 1.0 3.0 18
# South Adobe 2.0 3.0 NaN 2.0 2.0 3.0 1.0 3.0 2.0 18
# Office 4.0 3.0 1.0 2.0 NaN 2.0 3.0 2.0 2.0 19
# Open Source 1.0 2.0 2.0 4.0 1.0 NaN NaN 3.0 2.0 15
# Windows 2.0 1.0 1.0 2.0 2.0 2.0 1.0 3.0 1.0 15
# West Adobe 1.0 1.0 0.0 4.0 3.0 3.0 1.0 0.0 3.0 16
# Office 1.0 1.0 3.0 3.0 3.0 2.0 2.0 2.0 1.0 18
# Open Source 4.0 2.0 4.0 0.0 0.0 4.0 1.0 1.0 2.0 18
# Windows 2.0 2.0 1.0 5.0 4.0 1.0 4.0 1.0 0.0 20
# Total 32.0 31.0 31.0 27.0 25.0 30.0 26.0 27.0 23.0 252

关于python - pandas 计算多列的汇总 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48963149/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com