- c - 在位数组中找到第一个零
- linux - Unix 显示有关匹配两种模式之一的文件的信息
- 正则表达式替换多个文件
- linux - 隐藏来自 xtrace 的命令
我一直在四处寻找,试图弄清楚如何正确地对我的数据透视表进行排序,但我没有任何运气。
client unit task hours month
0 A DVADA Account Management 6.50 January
1 A DVADA Buying 1.25 January
2 A DVADA Meeting / Call 0.50 January
3 A DVADA Account Management 3.00 January
4 A DVADA Billing 2.50 February
5 A DVADA Account Management 6.50 February
6 A DVADA Buying 1.25 February
7 A DVADA Meeting / Call 0.50 February
8 A DVADA Account Management 3.00 February
9 A DVADA Billing 2.50 February
10 A DVADA Billing 2.50 December
11 A DVADA Account Management 6.50 December
12 A DVADA Buying 1.25 December
13 A DVADA Meeting / Call 0.50 December
14 A DVADA Account Management 3.00 December
15 A DVADA Billing 2.50 December
16 A DVADA Account Management 6.50 August
17 A DVADA Buying 1.25 August
18 A DVADA Meeting / Call 0.50 August
19 A DVADA Account Management 3.00 August
20 A DVADA Account Management 6.50 April
21 A DVADA Buying 1.25 April
22 A DVADA Meeting / Call 0.50 April
23 A DVADA Account Management 3.00 April
24 B DVADA Account Management 6.50 January
25 B DVADA Buying 1.25 January
26 B DVADA Meeting / Call 0.50 January
27 B DVADA Account Management 3.00 January
28 B DVADA Billing 2.50 February
29 B DVADA Account Management 6.50 February
30 B DVADA Buying 1.25 February
31 B DVADA Meeting / Call 0.50 February
32 B DVADA Account Management 3.00 February
33 B DVADA Billing 2.50 February
34 B DVADA Billing 2.50 December
35 B DVADA Account Management 6.50 December
36 B DVADA Buying 1.25 December
37 B DVADA Meeting / Call 0.50 December
38 B DVADA Account Management 3.00 December
39 B DVADA Billing 2.50 December
40 B DVADA Account Management 6.50 August
41 B DVADA Buying 1.25 August
42 B DVADA Meeting / Call 0.50 August
43 B DVADA Account Management 3.00 August
44 B DVADA Account Management 6.50 April
45 B DVADA Buying 1.25 April
46 B DVADA Meeting / Call 0.50 April
47 C DVADA Account Management 3.00 April
48 C DVADA Account Management 6.50 January
49 C DVADA Buying 1.25 January
50 C DVADA Meeting / Call 0.50 January
51 C DVADA Account Management 3.00 January
52 C DVADA Billing 2.50 February
53 C DVADA Account Management 6.50 February
54 C DVADA Buying 1.25 February
55 C DVADA Meeting / Call 0.50 February
56 C DVADA Account Management 3.00 February
57 C DVADA Billing 2.50 February
58 C DVADA Billing 2.50 December
59 C DVADA Account Management 6.50 December
60 C DVADA Buying 1.25 December
61 C DVADA Meeting / Call 0.50 December
62 C DVADA Account Management 3.00 December
63 C DVADA Billing 2.50 December
64 C DVADA Account Management 6.50 August
65 C DVADA Buying 1.25 August
66 C DVADA Meeting / Call 0.50 August
67 C DVADA Account Management 3.00 August
68 C DVADA Account Management 6.50 April
69 C DVADA Buying 1.25 April
70 C DVADA Meeting / Call 0.50 April
71 C DVADA Account Management 3.00 April
df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
返回一个包含三列(客户、月份、小时)的数据透视表。每个客户有 12 个月(1 月至 12 月),每个月都有一个小时。
hours
client month
A April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
B April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
C April 203.50
August 227.75
December 159.75
February 203.25
January 199.25
我想按月份对这个数据透视表进行排序,但保留客户列。
hours
client month
A January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
B January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
C January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Scott 的以下回答解决了排序问题。现在我想为每个客户添加一行,其中包含使用的总小时数。
hours
client month
A January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
B January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
C January 203.50
February 227.75
March 159.75
April 203.25
May 199.90
Total 1000.34
任何帮助将不胜感激
最佳答案
vp_clients['month'] = pd.Categorical(vp_clients['month'],
ordered=True,
categories=['January','February','March',
'April','May','June','July',
'August','September','October',
'November','December','Total'])
df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
df = df.dropna()
pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index()
输出:
hours
client month
A January 11.25
February 16.25
April 11.25
August 11.25
December 16.25
Total 66.25
B January 11.25
February 16.25
April 8.25
August 11.25
December 16.25
Total 63.25
C January 11.25
February 16.25
April 14.25
August 11.25
December 16.25
Total 69.25
让我们使用 pd.Categorical
:
vp_clients['month'] = pd.Categorical(vp_clients['month'],
ordered=True,
categories=['January','February','March',
'April','May','June','July',
'August','September','October',
'November','December'])
df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)
df.dropna()
输出:
hours
client month
A January 11.25
February 16.25
April 11.25
August 11.25
December 16.25
B January 11.25
February 16.25
April 8.25
August 11.25
December 16.25
C January 11.25
February 16.25
April 14.25
August 11.25
December 16.25
关于python - 在 Pandas pivot_table 中排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48913757/
我有以下数据框。 df.head(30) struct_id resNum score_type_name score_value 0 4294967297 1
我是python的新手。我有以下数据框。我能够在 Excel 中旋转。 我想添加差异列(在图像中,我手动添加了它)。 区别在于B-A值。我能够使用 Python 数据透视表复制差异列和总计。下面是我的
我正在尝试在 Dask 上使用 Pivot_table 和以下数据框: date store_nbr item_nbr unit_sales year month 0
我有一个像这样的数据框: ID Sim Items 1 0.345 [7,7] 2
我想根据以下数据框制作一个数据透视表,其中包含列 sales、rep。数据透视表显示 sales 但没有 rep。当我尝试仅使用 rep 时,出现错误 DataError: No numeric ty
如下所示: date 20170307 20170308 iphone4 2 0
考虑一个数据框: df = pd.DataFrame( {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, index=['foo',
我看到这个问题被问过多次,但其他问题的解决方案没有奏效! 我有这样的数据框 df = pd.DataFrame({ "date": ["20180920"] * 3 + ["20180921"] *
我正在使用 Pandas pivot_table在大型数据集(1000 万行,6 列)上运行。由于执行时间是最重要的,我尝试加快进程。目前处理整个数据集需要大约 8 秒,这很慢,我希望找到提高速度/性
我收到了 KeyError: "... not in index"使用pandas的pivot_table时。 这是示例代码: arrays = [['bar', 'bar', 'foo', 'foo
当将列设置为Margins=True时,pd.grouper datetime在 Pandas 数据透视表中将不起作用。这是我的代码,可以按预期工作- p = df.pivot_table(value
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two sm
数据集 x y a 1 3 0 1 1 0 1 2 0 3 6 0 5 3 1 1 5 0 1 7 0 1 6 0 1 4
数据集 x y a 1 3 0 1 1 0 1 2 0 3 6 0 5 3 1 1 5 0 1 7 0 1 6 0 1 4
我有这个样本: import pandas as pd import numpy as np dic = {'name': ['j','c','q','j','c','q','j','c
我对 pandas pivot_table 有疑问。 有时,“值”列表中指定的列的顺序不匹配 In [11]: p = pivot_table(df, values=["x","y"], cols=[
我试图通过平均值、中位数、第 25 个百分位数、第 75 个百分位数、标准差来描述 A 列、B 列。 df = pd.DataFrame({'A':[1,9,3,4,6,8,2,7],
我有下表: ID Metric Level Level(% Change) Level(Diff) Index 0 2016 A 10
我有下表: In [303]: table.head() Out[303]: people weekday weekofyear 2012-01-01 119
我似乎无法弄清楚如何将每个 date_submitted 组的总列百分比添加到下面的 pandas 数据透视表中: In [177]: pass_rate_pivot date_submitted
我是一名优秀的程序员,十分优秀!