gpt4 book ai didi

r - 组合包含 NA 的 data.table 列

转载 作者:行者123 更新时间:2023-12-04 12:14:36 31 4
gpt4 key购买 nike

我在数据表中有一组五列。

dt <- data.table(
city = c(rep(1,2), rep(2,2), rep(3,2), rep(4,2)),
neighborhoods.1 = c(NA, "a", "b", "c", NA, NA, "d", "e"),
neighborhoods.2 = c(NA, "f", "g", rep(NA,5)),
neighborhoods.3 = c(NA, "h", rep(NA, 6)),
irrelevantdata = c(1:8)
)

city neighborhoods.1 neighborhoods.2 neighborhoods.3 irrelevantdata
1: 1 NA NA NA 1
2: 1 a f h 2
3: 2 b g NA 3
4: 2 c NA NA 4
5: 3 NA NA NA 5
6: 3 NA NA NA 6
7: 4 d NA NA 7
8: 4 e NA NA 8

我想将前四列合并为一列。
   neighborhood
1: 1
2: 1-a-f-h
3: 2-b-g
4: 2-c
5: 3
6: 3
7: 4-d
8: 4-e

如您所见,我正在删除 NA记录并用 - 分隔.

这个我试过了,处理上有明显问题 j :
business[
,
neighborhood = paste0(
city,
if(!is.na(neighborhoods.1)) paste0("-", neighborhoods.1),
if(!is.na(neighborhoods.2)) paste0("-", neighborhoods.2),
if(!is.na(neighborhoods.3)) paste0("-", neighborhoods.3),
""
)
]

我怎样才能做到这一点?

更新以反射(reflect)我不想合并的其他列。

最佳答案

一种选择是 paste使用 do.call 将行中的元素放在一起,然后删除 NA元素以及额外的 -在输出向量中。

dt[,.(neighborhood = gsub('-NA|NA-', '', 
do.call(paste, c(.SD, sep='-')))), .SDcols= city:neighborhoods.3]

或者另一个选项是按行顺序分组, unlist Data.table 的子集 ( .SD ),删除 NA 元素 ( na.omit ), paste元素在一起。我们可以在 .SDcols 中指定要用于此操作的列.
dt[, .(neighbourhood = paste(na.omit(unlist(.SD)),collapse='-')) , 
by=1:nrow(dt), .SDcols= city:neighborhoods.3]

或者@Frank 建议的另一个选项是 melt数据集的子集(由所需的列指定)到长格式,然后 paste
 mycols <- setdiff(names(dt), 'irrelevantdata')
na.omit(melt(dt[, ..mycols][, r := .I],
id.var="r"))[, paste(value, collapse="-"), by=r]

关于r - 组合包含 NA 的 data.table 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33764714/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com