gpt4 book ai didi

r - 在向量中查找唯一的一组字符串,其中向量元素可以是多个字符串

转载 作者:行者123 更新时间:2023-12-01 02:02:19 25 4
gpt4 key购买 nike

我有一系列按顺序标记的批处理记录。有时批次重叠。

x <- c("1","1","1/2","2","3","4","5/4","5")
> data.frame(x)
x
1 1
2 1
3 1/2
4 2
5 3
6 4
7 5/4
8 5

我想找到一组不重叠的批次并标记这些时期。批次“1/2”包括“1”和“2”,因此它不是唯一的。当 batch = "3"表示不包含在任何以前的批次中时,因此它开始一个新的时期。我在处理合并批次时遇到困难,否则这将很简单。这样做的结果是:
    x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3

我的经验是更多的函数式编程范式,所以我知道我这样做的方式非常不 R。我正在寻找在 R 中做到这一点的方法,它既干净又简单。任何帮助表示赞赏。

这是我的 un-R 代码,但它非常笨重且不可扩展。
x <- c("1","1","1/2","2","3","4","5/4","5")

p <- 1 #period number
temp <- NULL #temp variable for storing cases of x (batches)
temp[1] <- x[1]
period <- NULL
rl <- 0 #length to repeat period

for (i in 1:length(x)){

#check for "/", split and add to temp
if (grepl("/", x[i])){
z <- strsplit(x[i], "/") #split character
z <- unlist(z) #convert to vector
temp <- c(temp, z, x[i]) #add to temp vector for comparison
}

#check if x in temp
if(x[i] %in% temp){
temp <- append(temp, x[i]) #add to search vector
rl <- rl + 1 #increase length
} else {
period <- append(period, rep(p, rl)) #add to period vector
p <- p + 1 #increase period count
temp <- NULL #reset
rl <- 1 #reset
}
}

#add last batch

rl <- length(x) - length(period)
period <- append(period, rep(p,rl))

df <- data.frame(x,period)

> df
x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3

最佳答案

R 具有功能范式影响,因此您可以使用 Map 解决此问题。和 Reduce .请注意,此解决方案遵循您合并可见值的方法。如果您假设批次编号是连续的,则可以使用更简单的方法,就像您的示例一样。

x <- c("1","1","1/2","2","3","4","5/4","5")
s<-strsplit(x,"/")
r<-Reduce(union,s,init=list(),acc=TRUE)
p<-cumsum(Map(function(x,y) length(intersect(x,y))==0,s,r[-length(r)]))

data.frame(x,period=p)
    x period1   1      12   1      13 1/2      14   2      15   3      26   4      37 5/4      38   5      3

What this does is first calculate a cumulative union of seen values. Then, it maps across this to determine the places where none of the current values have been seen before. (Alternatively, this second step could be included within the reduce, but this would be wordier without support for destructuring.) The cumulative sum provides the "period" numbers based on the number of times the intersections have come up empty.

If you do make the assumption that the batch numbers are consecutive then you can do the following instead

x <- c("1","1","1/2","2","3","4","5/4","5")
s<-strsplit(x,"/")
n<-mapply(function(x) range(as.numeric(x)),s)
p<-cumsum(c(1,n[1,-1]>n[2,-ncol(n)]))

data.frame(x,period=p)

对于相同的结果(此处不再重复)。

关于r - 在向量中查找唯一的一组字符串,其中向量元素可以是多个字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35231446/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com