gpt4 book ai didi

r - 使用 data.table 提取行中的最后一个非缺失值

转载 作者:行者123 更新时间:2023-12-03 21:13:32 35 4
gpt4 key购买 nike

我有一个因子列的 data.table,我想提取每行中最后一个非缺失值的标签。这是一种典型的max.col情况,但我不想不必要地强制,因为我正在尝试使用 data.table 优化此代码。真实数据也有其他类型的列。

这是例子,

## Some sample data
set.seed(0)
dat <- sapply(split(letters[1:25], rep.int(1:5, 5)), sample, size=8, replace=TRUE)
dat[upper.tri(dat)] <- NA
dat[4:5, 4:5] <- NA # the real data isnt nice and upper.triangular
dat <- data.frame(dat, stringsAsFactors = TRUE) # factor columns

## So, it looks like this
setDT(dat)[]
# X1 X2 X3 X4 X5
# 1: u NA NA NA NA
# 2: f q NA NA NA
# 3: f b w NA NA
# 4: k g h NA NA
# 5: u b r NA NA
# 6: f q w x t
# 7: u g h i e
# 8: u q r n t

## I just want to get the labels of the factors
## that are 'rightmost' in each row. I tried a number of things
## that probably don't make sense here.
## This just about gets the column index
dat[, colInd := sum(!is.na(.SD)), by=1:nrow(dat)]

不过,这是提取这些标签的目标,这里使用常规基本函数。
## Using max.col and a data.frame
df1 <- as.data.frame(dat)
inds <- max.col(is.na(as.matrix(df1)), ties="first")-1
inds[inds==0] <- ncol(df1)
df1[cbind(1:nrow(df1), inds)]
# [1] "u" "q" "w" "h" "r" "t" "e" "t"

最佳答案

这是另一种方式:

dat[, res := NA_character_]
for (v in rev(names(dat))[-1]) dat[is.na(res), res := get(v)]


X1 X2 X3 X4 X5 res
1: u NA NA NA NA u
2: f q NA NA NA q
3: f b w NA NA w
4: k g h NA NA h
5: u b r NA NA r
6: f q w x t t
7: u g h i e e
8: u q r n t t

基准 使用与@alexis_laz 相同的数据并对函数进行(显然)表面上的更改,我看到了不同的结果。只是在这里展示它们以防万一有人好奇。亚历克西斯的答案(稍作修改)仍然领先。

职能:
alex = function(x, ans = rep_len(NA, length(x[[1L]])), wh = seq_len(length(x[[1L]]))){
if(!length(wh)) return(ans)
ans[wh] = as.character(x[[length(x)]])[wh]
Recall(x[-length(x)], ans, wh[is.na(ans[wh])])
}

alex2 = function(x){
x[, res := NA_character_]
wh = x[, .I]
for (v in (length(x)-1):1){
if (!length(wh)) break
set(x, j="res", i=wh, v = x[[v]][wh])
wh = wh[is.na(x$res[wh])]
}
x$res
}

frank = function(x){
x[, res := NA_character_]
for(v in rev(names(x))[-1]) x[is.na(res), res := get(v)]
return(x$res)
}

frank2 = function(x){
x[, res := NA_character_]
for(v in rev(names(x))[-1]) x[is.na(res), res := .SD, .SDcols=v]
x$res
}

示例数据和基准:
DAT1 = as.data.table(lapply(ceiling(seq(0, 1e4, length.out = 1e2)), 
function(n) c(rep(NA, n), sample(letters, 3e5 - n, TRUE))))
DAT2 = copy(DAT1)
DAT3 = as.list(copy(DAT1))
DAT4 = copy(DAT1)

library(microbenchmark)
microbenchmark(frank(DAT1), frank2(DAT2), alex(DAT3), alex2(DAT4), times = 30)

Unit: milliseconds
expr min lq mean median uq max neval
frank(DAT1) 850.05980 909.28314 985.71700 979.84230 1023.57049 1183.37898 30
frank2(DAT2) 88.68229 93.40476 118.27959 107.69190 121.60257 346.48264 30
alex(DAT3) 98.56861 109.36653 131.21195 131.20760 149.99347 183.43918 30
alex2(DAT4) 26.14104 26.45840 30.79294 26.67951 31.24136 50.66723 30

关于r - 使用 data.table 提取行中的最后一个非缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33664659/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com