gpt4 book ai didi

r - 动物园里的动物 : can we aggregate a daily time series of factors and flag activity by ID?

转载 作者:行者123 更新时间:2023-12-04 19:09:35 25 4
gpt4 key购买 nike

假设有一个动物园中动物事件多年来的每日时间序列。一个非常大的数据集的子集可能如下所示:

library(data.table)
type <- c(rep('giraffe',90),rep('monkey',90),rep('anteater',90))
status <- as.factor(c(rep('display',31),rep('caged',28),rep('display',31),
rep('caged',25), rep('display',35),rep('caged',30),rep('caged',10),
rep('display',10),rep('caged',10),rep('display',60)))
date <- rep(seq.Date( as.Date("2001-01-01"), as.Date("2001-03-31"), "day" ),3)
其中“类型”是动物类型,“状态”是动物当天正在做什么的指标,例如,被关在笼子里或展示。
animals <-  data.table(type,status,date);animals
type status date
1: giraffe display 2001-01-01
2: giraffe display 2001-01-02
3: giraffe display 2001-01-03
4: giraffe display 2001-01-04
5: giraffe display 2001-01-05
---
266: anteater display 2001-03-27
267: anteater display 2001-03-28
268: anteater display 2001-03-29
269: anteater display 2001-03-30
270: anteater display 2001-03-31
假设我们希望将其汇总为一个每月系列,其中列出了动物及其整个月的状态信息。在新系列中,“状态”反射(reflect)了动物在月初的状态。 "fullmonth"是一个二元变量 (1=TRUE,0=FALSE),表示该状态是否持续整个月,而 "anydisp"是一个二元变量 (1=TRUE, 0=FALSE),表示动物是否在在一个月内的任何时间显示(>= 1 天)。所以,因为长颈鹿在 1 月和 3 月的整几个月里都在展出,但在 2 月被关在笼子里,所以它会被相应地标记。
date <- rep(seq.Date( as.Date("2001-01-01"), as.Date("2001-03-31"),"month"),3)
type <- c(rep('giraffe',3),rep('monkey',3),rep('anteater',3))
status <- as.factor(c('display','caged','display','caged','display','caged',
'caged','display','display'))
fullmonth <- c(1,1,1,0,1,0,0,1,1)
anydisp <- c(1,0,1,1,1,1,1,1,1)

animals2 <- data.table(date,type,status,fullmonth,anydisp);animals2
date type status fullmonth anydisp
2001-01-01 giraffe display 1 1
2001-02-01 giraffe caged 1 0
2001-03-01 giraffe display 1 1
2001-01-01 monkey caged 0 1
2001-02-01 monkey display 1 1
2001-03-01 monkey caged 0 1
2001-01-01 anteater caged 0 1
2001-02-01 anteater display 1 1
2001-03-01 anteater display 1 1
我以为 zoo可能是要走的路,但在玩弄之后我发现它不能很好地处理非数字值,即使我为定性组件(状态)分配了任意值,也不清楚它将如何解决问题。
##aggregate function with zoo? 
library(zoo)
animals$activity <- as.numeric(ifelse(status=='display',1,0))
animals2 <- subset(animals, select=c(date,activity))
datas <- zoo(animals2)
monthlyzoo <- aggregate(datas,as.yearmon,sum)
Error in Summary.factor(1L, na.rm = FALSE) :
sum not meaningful for factors
有人知道使用 sqldf 的解决方案吗?或 data.table ?
更新
想添加一个新要求,即即使数据在该月晚些时候开始,显示的日期也是该月的第一天。例如,这个数据集说明了这样一种情况:
animals2 <- animals[30:270,];head(animals2)

setkey(animals2, "type", "date")

oo <- animals2[, list(date=date[1], status = status[1],
fullmonth = 1 * all(status == status[1]),
anydisplay = any(status == "display") * 1 ),
by = list(month(date), type)][, month := NULL]
oo

type date status fullmonth anydisplay
1: anteater 2001-01-30 caged 0 1
2: anteater 2001-02-01 display 1 1
3: anteater 2001-03-01 display 1 1
4: giraffe 2001-01-01 display 1 1
5: giraffe 2001-02-01 caged 1 0
6: giraffe 2001-03-01 display 1 1
7: monkey 2001-01-01 caged 0 1
8: monkey 2001-02-01 display 1 1
9: monkey 2001-03-01 display 0 1

sqldf("select
min(date) date,
type,
status,
max(status) = min(status) fullmonth,
sum(status = 'display') > 0 anydisp
from animals2
group by type, strftime('%Y %m', date * 3600 * 24, 'unixepoch')
order by type, date")

date type status fullmonth anydisp
1 2001-01-30 anteater caged 0 1
2 2001-02-01 anteater display 1 1
3 2001-03-01 anteater display 1 1
4 2001-01-01 giraffe display 1 1
5 2001-02-01 giraffe caged 1 0
6 2001-03-01 giraffe display 1 1
7 2001-01-01 monkey caged 0 1
8 2001-02-01 monkey display 1 1
9 2001-03-01 monkey caged 0 1
这可以通过后处理任何解决方案来修改日期来适应:
dateswitch <- paste(year(animals2$date),month(animals2$date),1,sep='/')
dateswitch <- as.Date(dateswitch, "%Y/%m/%d")
animals2$date <- as.Date(dateswitch)

最佳答案

像这样的东西?

setkey(animals, "type", "date")
oo <- animals[, list(date=date[1], status = status[1],
fullmonth = 1 * all(status == status[1]),
anydisplay = any(status == "display") * 1),
by = list(month(date), type)][, month := NULL]
# type date status fullmonth anydisplay
# 1: anteater 2001-01-01 caged 0 1
# 2: anteater 2001-02-01 display 1 1
# 3: anteater 2001-03-01 display 1 1
# 4: giraffe 2001-01-01 display 1 1
# 5: giraffe 2001-02-01 caged 1 0
# 6: giraffe 2001-03-01 display 1 1
# 7: monkey 2001-01-01 caged 0 1
# 8: monkey 2001-02-01 display 1 1
# 9: monkey 2001-03-01 display 0 1

关于r - 动物园里的动物 : can we aggregate a daily time series of factors and flag activity by ID?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16504750/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com