gpt4 book ai didi

r - 计算 R 中非连续事件的持续时间和顺序

转载 作者:行者123 更新时间:2023-12-01 22:48:00 25 4
gpt4 key购买 nike

我的数据集包含一系列在视频中观察到的行为。对于每个行为,我都记录了它何时开始和何时结束。

datain <-data.frame(
A=c("1/5+11/18","0/5","7/10"),
B=c("6/10+19/25","11/15","11/20"),
C=c("26/30","6/10","0/6"))

我想获得每个行为的持续时间以及每个观察的行为顺序,就像在这个期望的输出中一样

dataout <-data.frame(
A=c("1/5+11/18","0/5","7/10"),
B=c("6/10+19/25","11/15","11/20"),
C=c("26/30","6/10","0/6"),
A.sum=c(11,5,3),
B.sum=c(10,4,9),
C.sum=c(4,4,6),
myorder=c("A/B/A/B/C","A/C/B","C/A/B"))

我正在尝试使用以下行来确定哪些列具有 + 并提取具有中断行为的行(但我仍然必须计算每个行为的持续时间),但我想可能有比我目前正在尝试的那个。

d.1 <- lapply(datain, function(x) str_which(x,"\\+"))
d.2 <- which(lapply(d.1,length)>0)
coltosum <- match(names(d.2),colnames(datain))

mylist <- lapply(datain[coltosum],function(x) strsplit(x,"\\+"))

一如既往,我将不胜感激任何建议。

请注意,我在几天后编辑了这个问题,以便在所需的输出中包含行为的顺序。

更新:我已经能够弄清楚如何获得行为的顺序。我敢打赌,有更优雅、更简洁的方法可以得到这个结果。代码下方

#removing empty columns
empty_columns <- sapply(datain, function(x) all(is.na(x) | x == ""))
datain<- datain[, !empty_columns]

#loop 1#
#this loop is for taking the occurrence of BH
mylist <- list()

for (i in seq(1,nrow(datain))){
mylist <- apply(datain,1,str_extract_all,pattern="\\d+")
myindx <- sapply(mylist, length)
myres <- c(do.call(cbind,lapply(mylist, `length<-`,max(myindx))))
names(myres) <- rep(colnames(datain),nrow(datain))
mydf <- ldply(myres,data.frame)
colnames(mydf) <- c("BH","values")
}

#loop 2#
#this loop is for counting the number of elements in a nested list
mydf.1 <- list()
myres.2 <- list()

for (i in seq(1,nrow(datain))){
mydf.1 <- length(unlist(mylist[i]))
myres.2[i] <- mydf.1
}

#this is for placing the row values
names(myres.2) <- rownames(datain)
myres.3 <- as.numeric(myres.2)

mydf$myrow <- c(rep(rownames(datain),myres.3))

#I can order by row and by values
mydf <- mydf[order(as.numeric(mydf$myrow),as.numeric(mydf$values)),]

#I have to pick up the right values
#I have to generate as many sequences as many elements for each row.
myseq <- sequence(myres.3)
mydf <- cbind(mydf,myseq)

myseq.2 <- seq(1,nrow(mydf),by=2)

#selecting the df according to the uneven row
mydf.1 <- mydf[myseq.2,]
myorder <-split(mydf.1,mydf.1$myrow)

#loop 3

myres.3 <- list()
for (i in seq(1,nrow(datain))){
myres.3 <- lapply(myorder,"[",i=1)
}

myorder.def <- data.frame(cbind(lapply(myres.3,paste0,collapse="/")))
colnames(myorder.def) <- "BH"

#last step, apply str_extract_all for each row
myorder.def$BH <- str_replace_all(myorder.def$BH,"c","")
myorder.def$BH <- str_replace_all(myorder.def$BH,"\\(","")
myorder.def$BH <- str_replace_all(myorder.def$BH,"\\)","")
myorder.def$BH <- str_replace_all(myorder.def$BH,"\"","")
myorder.def$BH <- str_replace_all(myorder.def$BH,", ","/")

data.out <- cbind(datain,myorder.def)
data.out

史蒂夫

最佳答案

base R 中的一个选项是循环遍历数据集的列 (lapply),然后替换数字 (\\d+) 后跟 / 和数字到分母 - 通过捕获这些数字并切换反向引用 (\\2-\\1) 和 eval(parse 字符串

datain[paste0(names(datain), ".sum")] <- lapply(datain, function(y) 
sapply(gsub("(\\d+)/(\\d+)", "(\\2-\\1)", y),
function(x) eval(parse(text = x))))

-检查OP的输出

> datain
A B C A.sum B.sum C.sum
1 3/4+6/8+11/16 0/5+15/20 0/5 8 10 5
2 0/5 5/10 3/10 5 5 7
> dataout
A B C A.sum B.sum C.sum
1 3/4+6/8+11/16 0/5+10/5 0/5 8 10 5
2 0/5 5/10 3/10 5 5 7

或者使用tidyverse,按行分组,循环遍历所有列,使用read.table将字符串读入data.frame ,减去列,得到 sum 并通过修改 .names

作为新列返回
library(dplyr)
library(stringr)
datain %>%
rowwise %>%
mutate(across(everything(), ~ sum(with(read.table(text =
str_replace_all(.x, fixed("+"), "\n"), sep = "/",
header = FALSE), V2 - V1)), .names = "{.col}.sum")) %>%
ungroup

-输出

# A tibble: 2 × 6
A B C A.sum B.sum C.sum
<chr> <chr> <chr> <int> <int> <int>
1 3/4+6/8+11/16 0/5+15/20 0/5 8 10 5
2 0/5 5/10 3/10 5 5 7

关于r - 计算 R 中非连续事件的持续时间和顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74933820/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com