gpt4 book ai didi

r - 如何正确加入面板数据以提取缺失值?

转载 作者:行者123 更新时间:2023-12-04 12:30:21 25 4
gpt4 key购买 nike

我想离开加入面板数据,因为缺少一些观察结果。但是,我无法做到这一点并保留面板结构:

数据:

# package I'm using
library(dplyr)

date <- as.Date(as.character(c("2015-02-13",
"2015-02-14",
"2015-02-16",
"2015-02-17",
"2015-02-14",
"2015-02-16",
"2015-02-13",
"2015-02-14",
"2015-02-17")))

b <-c("John","John","John","John","Michael","Michael","Thomas","Thomas","Thomas")
c <- c(20,30,26,20,30,40,5,10,4)
d <- c(11,2233,12,2,22,13,23,23,100)
# put together
df <- data.frame(b, dates,c,d)

df
b dates c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-16 26 12
#4 John 2015-02-17 20 2
#5 Michael 2015-02-14 30 22
#6 Michael 2015-02-16 40 13
#7 Thomas 2015-02-13 5 23
#8 Thomas 2015-02-14 10 23
#9 Thomas 2015-02-17 4 100

我尝试的是创建一个完整的日期向量并左连接:

date<-as.data.frame(seq(as.Date("2015-02-13"),as.Date("2015-02-17"),by="days"))
# rename seq. to date:
names(date)[names(date)=="seq(as.Date(\"2015-02-13\"), as.Date(\"2015-02-17\"), by = \"days\")"] <- "date"

# and left join:

t <- left_join(date,df,by=c("date"="dates"))

t

date b c d
#1 2015-02-13 John 20 11
#2 2015-02-13 Thomas 5 23
#3 2015-02-14 John 30 2233
#4 2015-02-14 Michael 30 22
#5 2015-02-14 Thomas 10 23
#6 2015-02-15 <NA> NA NA
#7 2015-02-16 John 26 12
#8 2015-02-16 Michael 40 13
#9 2015-02-17 John 20 2
#10 2015-02-17 Thomas 4 100

我怎样才能达到这样的结果:

     b      dates  c    d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10Michael 2015-02-17 NA NA
#7 Thomas 2015-02-13 5 23
#8 Thomas 2015-02-14 10 23
#8 Thomas 2015-02-15 NA NA
#8 Thomas 2015-02-16 NA NA
#9 Thomas 2015-02-17 4 100

最佳答案

我们可以使用expand.grid

 library(dplyr)
expand.grid(b = unique(df$b), date = seq(min(df$date), max(df$date), by = "1 day")) %>%
left_join(., df) %>%
arrange(b, date)
# b date c d
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10 Michael 2015-02-17 NA NA
#11 Thomas 2015-02-13 5 23
#12 Thomas 2015-02-14 10 23
#13 Thomas 2015-02-15 NA NA
#14 Thomas 2015-02-16 NA NA
#15 Thomas 2015-02-17 4 100

或者使用tidyr中的complete

library(tidyr)
complete(df, b, date = seq(min(date), max(date), by = "1 day"))
# b date c d
# <fctr> <date> <dbl> <dbl>
#1 John 2015-02-13 20 11
#2 John 2015-02-14 30 2233
#3 John 2015-02-15 NA NA
#4 John 2015-02-16 26 12
#5 John 2015-02-17 20 2
#6 Michael 2015-02-13 NA NA
#7 Michael 2015-02-14 30 22
#8 Michael 2015-02-15 NA NA
#9 Michael 2015-02-16 40 13
#10 Michael 2015-02-17 NA NA
#11 Thomas 2015-02-13 5 23
#12 Thomas 2015-02-14 10 23
#13 Thomas 2015-02-15 NA NA
#14 Thomas 2015-02-16 NA NA
#15 Thomas 2015-02-17 4 100

关于r - 如何正确加入面板数据以提取缺失值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38159925/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com