gpt4 book ai didi

r - 根据周围的非缺失值有条件地替换缺失值

转载 作者:行者123 更新时间:2023-12-04 18:01:17 25 4
gpt4 key购买 nike

我正在尝试替换向量中的缺失值( NA )。 NA两个相等的数字之间由那个数字代替。 NA在两个不同的值之间,应该保持 NA .例如,给定向量“a”,我希望它是“b”。

a = c(1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3)
b = c(1, 1, 1, 1, 1, NA, NA, NA, 2, 2, 2, 2, 3, 3, 3, 3)

如您所见,第二次运行 NA , 值之间 12 , 不被替换。

有没有办法矢量化计算?

最佳答案

OP 要求使用 vecgorized 解决方案,所以这里有一个可能的矢量化基础 R 解决方案(没有 for 循环),它也处理具有领先/滞后 NA 的情况

# Define a vector with Leading/Lagging NAs
a <- c(NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, 2, 3, NA, NA, 3, NA, NA)

# Save the boolean vector as we are going to reuse it a lot
na_vals <- is.na(a)

# Find the NAs location compared to the non-NAs
ind <- findInterval(which(na_vals), which(!na_vals))

# Find the consecutive values that equal
ind2 <- which(!diff(a[!na_vals]))

# Fill only NAs between equal consequtive files
a[na_vals] <- a[!na_vals][ind2[match(ind, ind2)]]
a
# [1] NA NA 1 1 1 1 1 NA NA NA 2 2 2 2 3 3 3 3 NA NA

大向量的一些时间比较
# Create a big vector
set.seed(123)
a <- sample(c(NA, 1:5), 5e7, replace = TRUE)

############################################
##### Cainã Max Couto-Silva

fill_data <- function(vec) {

for(l in unique(vec[!is.na(vec)])) {

g <- which(vec %in% l)

indexes <- list()

for(i in 1:(length(g) - 1)) {
indexes[[i]] <- (g[i]+1):(g[i+1]-1)
}

for(i in 1:(length(g) - 1)) {
if(all(is.na(vec[indexes[[i]]]))) {
vec[indexes[[i]]] <- l
}
}
}

return(vec)
}

system.time(res <- fill_data(a))
# user system elapsed
# 81.73 4.41 86.48

############################################
##### Henrik

system.time({
a_ap <- na.approx(a, na.rm = FALSE)
a_locf <- na.locf(a, na.rm = FALSE)
a[which(a_ap == a_locf)] <- a_ap[which(a_ap == a_locf)]
})
# user system elapsed
# 12.55 3.39 15.98

# Validate
identical(res, as.integer(a))
# [1] TRUE

############################################
##### David

## Recreate a as it been overridden
set.seed(123)
a <- sample(c(NA, 1:5), 5e7, replace = TRUE)

system.time({
# Save the boolean vector as we are going to reuse it a lot
na_vals <- is.na(a)

# Find the NAs location compaed to the non-NAs
ind <- findInterval(which(na_vals), which(!na_vals))

# Find the consecutive values that equl
ind2 <- which(!diff(a[!na_vals]))

# Fill only NAs between equal consequtive files
a[na_vals] <- a[!na_vals][ind2[match(ind, ind2)]]
})
# user system elapsed
# 3.39 0.71 4.13

# Validate
identical(res, a)
# [1] TRUE

关于r - 根据周围的非缺失值有条件地替换缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49696217/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com