gpt4 book ai didi

r - 使用线性近似值对 NA 观测值进行插补

转载 作者:行者123 更新时间:2023-12-01 23:43:35 25 4
gpt4 key购买 nike

我想在数组的开头估算 NA 观测值,使用以下两个非 NA 观测值的线性近似值来推断缺失值。然后使用前面的两个非 NA 观察值对数组末尾的 NA 观察值执行相同的操作。

我的 df 的可重现示例:

M=matrix(sample(1:9,10*10,T),10);M[sample(1:length(M),0.5*length(M),F)]=NA;dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M

Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4 5 NA 3 NA NA NA 5 NA
City2 6 NA 3 3 NA 4 6 NA NA 7
City3 NA 7 NA 8 8 NA NA 8 NA 5
City4 3 5 3 NA NA 3 5 9 8 7
City5 4 6 6 NA NA 8 NA 7 1 NA
City6 NA NA NA NA 4 NA 8 3 6 7
City7 9 3 NA NA NA NA NA 4 NA NA
City8 5 6 9 8 5 NA NA 1 4 NA
City9 NA NA 6 NA 3 3 8 NA 7 NA
City10 NA NA NA NA NA NA NA NA NA 1

idx=rowSums(!is.na(M))>=2 # Index of rows with 2 or more non-NA to run na.approx

library(zoo)
M[idx,]=t(na.approx(t(M[idx,]),rule=1,method="linear")) # I'm using t as na.approx works on columns

Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 NA NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1

我想根据前面/后面的两个观察结果使用线性近似来推断边界(对于 City1City9)。例如 M[1,1] 应该是 3M[1,10] 应该是 5,5.

你知道我该怎么做吗?

最佳答案

extrap , nlead是输入向量中前导 NA 的数量 x . non.nax 的元素子集这不是 NA。如果没有前导 NA 元素或者少于 2 个非 NA 元素,则返回输入。 m是前两个非 NA 的斜率。替换第一个 nlead x 的元素与外推。最后我们申请extrapM 的每一行使用 MM[] <-所以保留列名,然后反转每一行,重复并反转:

library(zoo)

extrap <- function(x) {
nlead <- which.min(x * 0) - 1
non.na <- na.omit(x)
if (length(nlead) == 0 || nlead == 0) || length(non.na) < 2) return(x)
m <- diff(head(non.na, 2))
replace(x, seq_len(nlead), non.na[1] - nlead:1 * m)
}

nc <- ncol(M)

naApprox <- function(x) if (length(na.omit(x)) < 2) x else na.approx(x, na.rm = FALSE)
MM <- M
MM[] <- t(apply(MM, 1, naApprox))

MM[] <- t(apply(MM, 1, extrap)) # extraploate to fill leading NAs
MM[] <- t(apply(MM[, nc:1], 1, extrap))[, nc:1] # extrapolate to fill trailing NAs

给予:

> MM
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 3.0 4.0 5.000000 4.000000 3.000000 3.500000 4.000000 4.500000 5.000000 5.500000
City2 6.0 4.5 3.000000 3.000000 3.500000 4.000000 6.000000 6.333333 6.666667 7.000000
City3 6.5 7.0 7.500000 8.000000 8.000000 8.000000 8.000000 8.000000 6.500000 5.000000
City4 3.0 5.0 3.000000 3.000000 3.000000 3.000000 5.000000 9.000000 8.000000 7.000000
City5 4.0 6.0 6.000000 6.666667 7.333333 8.000000 7.500000 7.000000 1.000000 -5.000000
City6 -4.0 -2.0 0.000000 2.000000 4.000000 6.000000 8.000000 3.000000 6.000000 7.000000
City7 9.0 3.0 3.166667 3.333333 3.500000 3.666667 3.833333 4.000000 4.166667 4.333333
City8 5.0 6.0 9.000000 8.000000 5.000000 3.666667 2.333333 1.000000 4.000000 7.000000
City9 9.0 7.5 6.000000 4.500000 3.000000 3.000000 8.000000 7.500000 7.000000 6.500000
City10 NA NA NA NA NA NA NA NA NA 1.000000

注意 我们将其用作 M :

M <- structure(c(NA, 6L, NA, 3L, 4L, NA, 9L, 5L, NA, NA, 4L, NA, 7L, 
5L, 6L, NA, 3L, 6L, NA, NA, 5L, 3L, NA, 3L, 6L, NA, NA, 9L, 6L,
NA, NA, 3L, 8L, NA, NA, NA, NA, 8L, NA, NA, 3L, NA, 8L, NA, NA,
4L, NA, 5L, 3L, NA, NA, 4L, NA, 3L, 8L, NA, NA, NA, 3L, NA, NA,
6L, NA, 5L, NA, 8L, NA, NA, 8L, NA, NA, NA, 8L, 9L, 7L, 3L, 4L,
1L, NA, NA, 5L, NA, NA, 8L, 1L, 6L, NA, 4L, 7L, NA, NA, 7L, 5L,
7L, NA, 7L, NA, NA, NA, 1L), .Dim = c(10L, 10L), .Dimnames = list(
c("City1", "City2", "City3", "City4", "City5", "City6", "City7",
"City8", "City9", "City10"), c("Year1", "Year2", "Year3",
"Year4", "Year5", "Year6", "Year7", "Year8", "Year9", "Year10"
)))

更新:已修复。

关于r - 使用线性近似值对 NA 观测值进行插补,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30167674/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com