gpt4 book ai didi

r - 使用 dplyr 分组数据中的 cumsum

转载 作者:行者123 更新时间:2023-12-04 02:10:27 28 4
gpt4 key购买 nike

我有一个数据框 df(可以下载 here),指的是看起来像这样的公司注册:

    Provider.ID        Local.Authority month year entry exit total
1 1-102642676 Warwickshire 10 2010 2 0 2
2 1-102642676 Bury 10 2010 1 0 1
3 1-102642676 Kent 10 2010 1 0 1
4 1-102642676 Essex 10 2010 1 0 1
5 1-102642676 Lambeth 10 2010 2 0 2
6 1-102642676 East Sussex 10 2010 5 0 5
7 1-102642676 Bristol, City of 10 2010 1 0 1
8 1-102642676 Liverpool 10 2010 1 0 1
9 1-102642676 Merton 10 2010 1 0 1
10 1-102642676 Cheshire East 10 2010 2 0 2
11 1-102642676 Knowsley 10 2010 1 0 1
12 1-102642676 North Yorkshire 10 2010 1 0 1
13 1-102642676 Kingston upon Thames 10 2010 1 0 1
14 1-102642676 Lewisham 10 2010 1 0 1
15 1-102642676 Wiltshire 10 2010 1 0 1
16 1-102642676 Hampshire 10 2010 1 0 1
17 1-102642676 Wandsworth 10 2010 1 0 1
18 1-102642676 Brent 10 2010 1 0 1
19 1-102642676 West Sussex 10 2010 1 0 1
20 1-102642676 Windsor and Maidenhead 10 2010 1 0 1
21 1-102642676 Luton 10 2010 1 0 1
22 1-102642676 Enfield 10 2010 1 0 1
23 1-102642676 Somerset 10 2010 1 0 1
24 1-102642676 Cambridgeshire 10 2010 1 0 1
25 1-102642676 Hillingdon 10 2010 1 0 1
26 1-102642676 Havering 10 2010 1 0 1
27 1-102642676 Solihull 10 2010 1 0 1
28 1-102642676 Bexley 10 2010 1 0 1
29 1-102642676 Sandwell 10 2010 1 0 1
30 1-102642676 Southampton 10 2010 1 0 1
31 1-102642676 Trafford 10 2010 1 0 1
32 1-102642676 Newham 10 2010 1 0 1
33 1-102642676 West Berkshire 10 2010 1 0 1
34 1-102642676 Reading 10 2010 1 0 1
35 1-102642676 Hartlepool 10 2010 1 0 1
36 1-102642676 Hampshire 3 2011 1 0 1
37 1-102642676 Kent 9 2011 0 1 -1
38 1-102642676 North Yorkshire 12 2011 0 1 -1
39 1-102642676 North Somerset 12 2012 2 0 2
40 1-102642676 Kent 10 2014 1 0 1
41 1-102642676 Somerset 1 2016 0 1 -1

我的目标是创建一个变量,该变量反射(reflect)每个 Local.Authority 和每个 year 的最后一个变量 (total) 的累积总和>。 total 只是entryexit 之间的差值。我试图通过在以下基础上应用 dplyr 来执行此操作:

library(dplyr)
df.1 = df %>% group_by(Local.Authority, year) %>%
mutate(cum.total = cumsum(total)) %>%
arrange(year, month, Local.Authority)

产生以下(错误)结果:

> df.1
Source: local data frame [41 x 8]
Groups: Local.Authority, year [41]

Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Bexley 10 2010 1 0 1 35
2 1-102642676 Brent 10 2010 1 0 1 25
3 1-102642676 Bristol, City of 10 2010 1 0 1 13
4 1-102642676 Bury 10 2010 1 0 1 3
5 1-102642676 Cambridgeshire 10 2010 1 0 1 31
6 1-102642676 Cheshire East 10 2010 2 0 2 17
7 1-102642676 East Sussex 10 2010 5 0 5 12
8 1-102642676 Enfield 10 2010 1 0 1 29
9 1-102642676 Essex 10 2010 1 0 1 5
10 1-102642676 Hampshire 10 2010 1 0 1 23
.. ... ... ... ... ... ... ... ...

我已经通过检查出现在不同年份(例如 Kent)的变量 Local.Authority 中的级别来确认这些结果:

> check = df.1 %>% filter(Local.Authority == "Kent")
> check
Source: local data frame [3 x 8]
Groups: Local.Authority, year [3]

Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Kent 10 2010 1 0 1 4
2 1-102642676 Kent 9 2011 0 1 -1 42
3 1-102642676 Kent 10 2014 1 0 1 44

它应该在哪里:

Provider.ID Local.Authority month  year entry  exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Kent 10 2010 1 0 1 1
2 1-102642676 Kent 9 2011 0 1 -1 0
3 1-102642676 Kent 10 2014 1 0 1 1

谁知道从 cumsum 中得到这些结果会发生什么?非常感谢。

最佳答案

当您按 local.Authority & year 分组时,它采用唯一值并将结果打印为 1,-1,1,因此最好仅按 local.Authority 分组,其中 cumsum 基于总值和结果 1,0,1

 df <- df %>%
group_by(Local.Authority) %>%
mutate(cum.to = cumsum(total))

> df
Source: local data frame [3 x 8]
Groups: Local.Authority [1]

Provider.ID Local.Authority month year entry exit total cum.to
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1-102642676 Kent 10 2010 1 0 1 1
2 1-102642676 Kent 9 2011 0 1 -1 0
3 1-102642676 Kent 10 2014 1 0 1 1

关于r - 使用 dplyr 分组数据中的 cumsum,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39080104/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com