gpt4 book ai didi

r - group_by 并创建一系列每月日期

转载 作者:行者123 更新时间:2023-12-04 19:52:37 28 4
gpt4 key购买 nike

我有一些数据如下所示:

     cusip       date start_date   end_date
1 00036020 2011-01-31 2011-07-29 2012-06-30
2 00036020 2011-02-28 2011-07-29 2012-06-30
3 00036020 2011-03-31 2011-07-29 2012-06-30
4 00036020 2011-04-29 2011-07-29 2012-06-30
5 00036020 2011-05-31 2011-07-29 2012-06-30
6 00036020 2011-06-30 2011-07-29 2012-06-30

我想对 group_by 列进行 id 并计算 start_dateend_date 之间的月底日期。或者在 start_dateend_date 之间创建一系列每月日期,我可以将 date 列与之匹配。

我基本上想将分组数据过滤到开始日期和结束日期之间,只是执行 filter(date >= start_date & date <= end_date) 并不能得到结果。

执行以下操作:

  group_by(cusip, start_date, end_date) %>%
filter(date >= start_date & date <= end_date)

返回:

> head(df2, 13)
# A tibble: 13 x 4
# Groups: cusip, start_date, end_date [3]
cusip date start_date end_date
<chr> <date> <date> <date>
1 00036020 2011-07-29 2011-07-29 2012-06-30
2 00036020 2011-08-31 2011-07-29 2012-06-30
3 00036020 2011-09-30 2011-07-29 2012-06-30
4 00036020 2011-10-31 2011-07-29 2012-06-30
5 00036020 2011-11-30 2011-07-29 2012-06-30
6 00036020 2011-12-30 2011-07-29 2012-06-30
7 00036020 2012-07-31 2012-07-31 2013-06-30
8 00036020 2012-08-31 2012-07-31 2013-06-30
9 00036020 2012-09-28 2012-07-31 2013-06-30
10 00036020 2012-10-31 2012-07-31 2013-06-30
11 00036020 2012-11-30 2012-07-31 2013-06-30
12 00036020 2012-12-31 2012-07-31 2013-06-30
13 00036020 2013-07-31 2013-07-31 2014-06-30

这也不是我想要的结果。从第 6/7 行开始,我丢失了 6 个月的数据。

我有一个比这大得多的数据框,我想将数据过滤到 datestart_date 之间的 end_date 列之间。

只是在想我该怎么做。

数据:

    df <- structure(list(cusip = c("00036020", "00036020", "00036020", 
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036020", "00036020", "00036020",
"00036020", "00036020", "00036020", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110", "00036110", "00036110", "00036110",
"00036110", "00036110", "00036110"), date = structure(c(15005,
15033, 15064, 15093, 15125, 15155, 15184, 15217, 15247, 15278,
15308, 15338, 15370, 15399, 15429, 15460, 15491, 15520, 15552,
15583, 15611, 15644, 15674, 15705, 15736, 15764, 15792, 15825,
15856, 15884, 15917, 15947, 15978, 16009, 16038, 16070, 16101,
16129, 16160, 16190, 16220, 16251, 16282, 16311, 16343, 16374,
16402, 16435, 16465, 16493, 16525, 16555, 16584, 16616, 16647,
16678, 16708, 16738, 16769, 16800, 16829, 16829, 16860, 16860,
16891, 16891, 16920, 16920, 16952, 16952, 16982, 16982, 17011,
17011, 17044, 17044, 17074, 17074, 17105, 17105, 17135, 17135,
17165, 17165, 17197, 17225, 17256, 17284, 17317, 17347, 17378,
17409, 17438, 17470, 17500, 17529, 17562, 17590, 17619, 17651,
17682, 17711, 17743, 17774, 17802, 17835, 17865, 17896, 12814,
12842, 12873, 12902, 12934, 12964, 12993, 13026, 13056, 13087,
13117, 13147, 13179, 13207, 13238, 13266, 13299, 13329, 13360,
13391, 13420, 13452, 13482, 13511, 13544, 13572, 13602, 13633,
13664, 13693, 13725, 13756, 13784, 13817, 13847, 13878, 13909,
13938, 13969, 13999, 14029, 14060, 14091, 14120, 14152, 14183,
14211, 14244, 14274, 14302, 14334, 14364, 14393, 14425, 14456,
14487, 14517, 14547, 14578, 14609, 14638, 14666, 14699, 14729,
14757, 14790, 14820, 14852, 14882, 14911, 14943, 14974, 15005,
15033, 15064, 15093, 15125, 15155, 15184, 15217, 15247, 15278,
15308, 15338, 15370, 15399, 15429, 15460, 15491, 15520, 15552,
15583, 15611, 15644, 15674, 15705, 15736, 15764, 15792, 15825,
15856, 15884, 15917, 15947, 15978, 16009, 16038, 16070, 16101,
16129, 16160, 16190, 16220, 16251, 16282, 16311, 16343, 16374,
16402, 16435, 16465, 16493, 16525, 16555, 16584, 16616, 16647,
16678, 16708, 16738, 16769, 16800, 16829, 16860, 16891, 16920,
16952, 16982, 17011, 17044, 17074, 17105, 17135, 17165, 17197,
17225, 17256, 17284, 17317, 17347, 17378, 17409, 17438, 17470,
17500, 17529), class = "Date"), start_date = structure(c(15184,
15184, 15184, 15184, 15184, 15184, 15184, 15184, 15184, 15184,
15184, 15184, 15552, 15552, 15552, 15552, 15552, 15552, 15552,
15552, 15552, 15552, 15552, 15552, 15917, 15917, 15917, 15917,
15917, 15917, 15917, 15917, 15917, 15917, 15917, 15917, 16282,
16282, 16282, 16282, 16282, 16282, 16282, 16282, 16282, 16282,
16282, 16282, 16647, 16647, 16647, 16647, 16647, 16647, 16647,
16647, 16647, 16647, 16647, 16647, 17011, 17011, 17011, 17011,
17011, 17011, 17011, 17011, 17011, 17011, 17011, 17011, 17011,
17011, 17011, 17011, 17011, 17011, 17011, 17011, 17011, 17011,
17011, 17011, 17378, 17378, 17378, 17378, 17378, 17378, 17378,
17378, 17378, 17378, 17378, 17378, 17743, 17743, 17743, 17743,
17743, 17743, 17743, 17743, 17743, 17743, 17743, 17743, 13360,
13360, 13360, 13360, 13360, 13360, 13360, 13360, 13360, 13360,
13360, 13360, 13725, 13725, 13725, 13725, 13725, 13725, 13725,
13725, 13725, 13725, 13725, 13725, 14091, 14091, 14091, 14091,
14091, 14091, 14091, 14091, 14091, 14091, 14091, 14091, 14456,
14456, 14456, 14456, 14456, 14456, 14456, 14456, 14456, 14456,
14456, 14456, 14820, 14820, 14820, 14820, 14820, 14820, 14820,
14820, 14820, 14820, 14820, 14820, 15184, 15184, 15184, 15184,
15184, 15184, 15184, 15184, 15184, 15184, 15184, 15184, 15552,
15552, 15552, 15552, 15552, 15552, 15552, 15552, 15552, 15552,
15552, 15552, 15917, 15917, 15917, 15917, 15917, 15917, 15917,
15917, 15917, 15917, 15917, 15917, 16282, 16282, 16282, 16282,
16282, 16282, 16282, 16282, 16282, 16282, 16282, 16282, 16647,
16647, 16647, 16647, 16647, 16647, 16647, 16647, 16647, 16647,
16647, 16647, 17011, 17011, 17011, 17011, 17011, 17011, 17011,
17011, 17011, 17011, 17011, 17011, 17378, 17378, 17378, 17378,
17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378, 17743,
17743, 17743, 17743, 17743, 17743, 17743, 17743, 17743, 17743,
17743, 17743), class = "Date"), end_date = structure(c(15521,
15521, 15521, 15521, 15521, 15521, 15521, 15521, 15521, 15521,
15521, 15521, 15886, 15886, 15886, 15886, 15886, 15886, 15886,
15886, 15886, 15886, 15886, 15886, 16251, 16251, 16251, 16251,
16251, 16251, 16251, 16251, 16251, 16251, 16251, 16251, 16616,
16616, 16616, 16616, 16616, 16616, 16616, 16616, 16616, 16616,
16616, 16616, 16982, 16982, 16982, 16982, 16982, 16982, 16982,
16982, 16982, 16982, 16982, 16982, 17347, 17347, 17347, 17347,
17347, 17347, 17347, 17347, 17347, 17347, 17347, 17347, 17347,
17347, 17347, 17347, 17347, 17347, 17347, 17347, 17347, 17347,
17347, 17347, 17712, 17712, 17712, 17712, 17712, 17712, 17712,
17712, 17712, 17712, 17712, 17712, 18077, 18077, 18077, 18077,
18077, 18077, 18077, 18077, 18077, 18077, 18077, 18077, 13694,
13694, 13694, 13694, 13694, 13694, 13694, 13694, 13694, 13694,
13694, 13694, 14060, 14060, 14060, 14060, 14060, 14060, 14060,
14060, 14060, 14060, 14060, 14060, 14425, 14425, 14425, 14425,
14425, 14425, 14425, 14425, 14425, 14425, 14425, 14425, 14790,
14790, 14790, 14790, 14790, 14790, 14790, 14790, 14790, 14790,
14790, 14790, 15155, 15155, 15155, 15155, 15155, 15155, 15155,
15155, 15155, 15155, 15155, 15155, 15521, 15521, 15521, 15521,
15521, 15521, 15521, 15521, 15521, 15521, 15521, 15521, 15886,
15886, 15886, 15886, 15886, 15886, 15886, 15886, 15886, 15886,
15886, 15886, 16251, 16251, 16251, 16251, 16251, 16251, 16251,
16251, 16251, 16251, 16251, 16251, 16616, 16616, 16616, 16616,
16616, 16616, 16616, 16616, 16616, 16616, 16616, 16616, 16982,
16982, 16982, 16982, 16982, 16982, 16982, 16982, 16982, 16982,
16982, 16982, 17347, 17347, 17347, 17347, 17347, 17347, 17347,
17347, 17347, 17347, 17347, 17347, 17712, 17712, 17712, 17712,
17712, 17712, 17712, 17712, 17712, 17712, 17712, 17712, 18077,
18077, 18077, 18077, 18077, 18077, 18077, 18077, 18077, 18077,
18077, 18077), class = "Date")), row.names = c(NA, -264L), class = "data.frame")

编辑:预期输出:

预期的输出基本上是“复制”date 列。所以创建一个日期序列如下:

前 24 个观察值:

第一个序列是从 2011-07-292012-06-30 ,因此将从第 7 行开始(所有带有 ** 的行将被丢弃),因为它们小于 start_date 。该序列应持续 12 个月 seq(from = as.Date("2011-07-29"), to = as.Date("2012-06-30"), by = "months") 在第 18 行结束。新序列从第 19 行开始,因为 start_date2012-07-31

          cusip       date start_date   end_date
** 1 00036020 2011-01-31 2011-07-29 2012-06-30
** 2 00036020 2011-02-28 2011-07-29 2012-06-30
** 3 00036020 2011-03-31 2011-07-29 2012-06-30
** 4 00036020 2011-04-29 2011-07-29 2012-06-30
** 5 00036020 2011-05-31 2011-07-29 2012-06-30
** 6 00036020 2011-06-30 2011-07-29 2012-06-30
7 00036020 2011-07-29 2011-07-29 2012-06-30
8 00036020 2011-08-31 2011-07-29 2012-06-30
9 00036020 2011-09-30 2011-07-29 2012-06-30
10 00036020 2011-10-31 2011-07-29 2012-06-30
11 00036020 2011-11-30 2011-07-29 2012-06-30
12 00036020 2011-12-30 2011-07-29 2012-06-30
13 00036020 2012-01-31 2012-07-31 2013-06-30
14 00036020 2012-02-29 2012-07-31 2013-06-30
15 00036020 2012-03-30 2012-07-31 2013-06-30
16 00036020 2012-04-30 2012-07-31 2013-06-30
17 00036020 2012-05-31 2012-07-31 2013-06-30
18 00036020 2012-06-29 2012-07-31 2013-06-30
19 00036020 2012-07-31 2012-07-31 2013-06-30
20 00036020 2012-08-31 2012-07-31 2013-06-30
21 00036020 2012-09-28 2012-07-31 2013-06-30
22 00036020 2012-10-31 2012-07-31 2013-06-30
23 00036020 2012-11-30 2012-07-31 2013-06-30
24 00036020 2012-12-31 2012-07-31 2013-06-30

我在想我应该让 start_dateend_date 成为唯一值并从那里过滤。

最佳答案

如果我们需要为每个“start_date”及其相应的“end_date”创建一个日期序列,可以使用 map2 完成,这里它不需要任何分组,因为它获得了每个相应的“开始日期/结束日期”的序列

library(purrr)
df %>%
mutate(Seq = map2(start_date, end_date, seq, by = '1 day'))

更新

基于OP的评论

df %>%  
group_by(cusip) %>%
mutate(rn = row_number()) %>%
filter(cummax(date >= start_date & date <= end_date) > 0)
# A tibble: 102 x 5
# Groups: cusip [1]
# cusip date start_date end_date rn
# <chr> <date> <date> <date> <int>
# 1 00036020 2011-07-29 2011-07-29 2012-06-30 7
# 2 00036020 2011-08-31 2011-07-29 2012-06-30 8
# 3 00036020 2011-09-30 2011-07-29 2012-06-30 9
# 4 00036020 2011-10-31 2011-07-29 2012-06-30 10
# 5 00036020 2011-11-30 2011-07-29 2012-06-30 11
# 6 00036020 2011-12-30 2011-07-29 2012-06-30 12
# 7 00036020 2012-01-31 2012-07-31 2013-06-30 13
# 8 00036020 2012-02-29 2012-07-31 2013-06-30 14
# 9 00036020 2012-03-30 2012-07-31 2013-06-30 15
#10 00036020 2012-04-30 2012-07-31 2013-06-30 16
# … with 92 more rows

-检查前 24 行

关于r - group_by 并创建一系列每月日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58050764/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com