gpt4 book ai didi

r - 如何更改数据表中的几个列名

转载 作者:行者123 更新时间:2023-12-03 15:22:33 25 4
gpt4 key购买 nike

我有一个包含10列的数据表。

town    
tc
one
two
three
four
five
six
seven
total

需要为我使用的“一”到“总计”列生成均值,
DTmean <- DT[,(lapply(.SD,mean)),by = .(town,tc),.SDcols=3:10]

这会产生平均值,但随后我希望将列名后缀为“_mean”。我们应该怎么做?希望前两列与“town”和“tc”保持相同。我尝试了下面的方法,但随后将所有“一个”重命名为“总计”,仅重命名为“_mean”
for (i in 3:10) {
setnames(DTmean,i,paste0(names(i),"_mean"))
}

最佳答案

如果要以data.table方式进行操作,则应按以下方式使用setnames:

setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))

或者:
cols <- names(DT)[3:10]
setnames(DTmean, cols, paste0(cols, '_mean'))

此外,在汇总所有其他列时,不需要 .SDcols语句。因此,使用 DT[, lapply(.SD,mean), by = .(town,tc)]应该会给您与使用 DT[, (lapply(.SD,mean)), by = .(town,tc), .SDcols=3:10]相同的结果。

在以下示例数据集上:
set.seed(71)
DT <- data.table(town = rep(c('A','B'), each=10),
tc = rep(c('C','D'), 10),
one = rnorm(20,1,1),
two = rnorm(20,2,1),
three = rnorm(20,3,1),
four = rnorm(20,4,1),
five = rnorm(20,5,2),
six = rnorm(20,6,2),
seven = rnorm(20,7,2),
total = rnorm(20,28,3))

使用:
DTmean <- DT[, lapply(.SD,mean), by = .(town,tc)]
setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))

给出:
> DTmean
town tc one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean
1: A C 1.7368898 1.883586 3.358440 4.849896 4.742609 5.089877 6.792513 29.20286
2: A D 0.8906842 1.826135 3.267684 3.760931 6.210145 7.320693 5.571687 26.56142
3: B C 1.4037955 2.474836 2.587920 3.719658 3.446612 6.510183 8.309784 27.80012
4: B D 0.8103511 1.153000 3.360940 3.945082 5.555999 6.198380 8.652779 28.95180

回复您的评论:如果您想同时计算均值和 sd,则可以这样做(改编自我的回答 here):
DT[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x), sd = sd(x))))), by = .(town,tc)]

这使:
   town tc  one.mean    one.sd two.mean    two.sd three.mean  three.sd four.mean  four.sd five.mean   five.sd six.mean    six.sd seven.mean seven.sd total.mean total.sd
1: A C 0.2981842 0.3556520 1.578174 0.7788545 2.232366 0.9047046 4.896201 1.238877 4.625866 0.7436584 7.607439 1.7262628 7.949366 1.772771 28.94287 3.902602
2: A D 1.2099018 1.0205252 1.686068 1.5497989 2.671027 0.8323733 4.811279 1.404794 7.235969 0.7883873 6.765797 2.7719942 6.657298 1.107843 27.42563 3.380785
3: B C 0.9238309 0.6679821 2.525485 0.8054734 3.138298 1.0111270 3.876207 0.573342 3.843140 2.1991052 4.942155 0.7784024 6.783383 2.595116 28.95243 1.078307
4: B D 0.8843948 0.9384975 1.988908 1.0543981 3.673393 1.3505701 3.957534 1.097837 2.788119 1.9089660 6.463784 0.7642144 6.416487 2.041441 27.88205 3.807119

但是,最好以长格式存储此文件。为此,您可以使用 data.tablemelt函数,如下所示:
cols <- names(DT)[3:10]
DT2 <- melt(DT[, as.list(unlist(lapply(.SD, function(x) list(mn = mean(x), sdev = sd(x))))), by = .(town,tc)],
id.vars = c('town','tc'),
measure.vars = patterns('.mn','.sdev'),
value.name = c('mn','sdev'))[, variable := cols[variable]]

或更简单的操作:
DT2 <- melt(DT, id.vars = c('town','tc'))[, .(mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)]

结果是:
> DT2
town tc variable mn sdev
1: A C one 0.2981842 0.3556520
2: A D one 1.2099018 1.0205252
3: B C one 0.9238309 0.6679821
4: B D one 0.8843948 0.9384975
5: A C two 1.5781743 0.7788545
6: A D two 1.6860675 1.5497989
7: B C two 2.5254855 0.8054734
8: B D two 1.9889082 1.0543981
9: A C three 2.2323655 0.9047046
10: A D three 2.6710267 0.8323733
11: B C three 3.1382982 1.0111270
12: B D three 3.6733929 1.3505701
.....

为了回应您的最新评论,您可以按以下步骤检测异常值:
DT3 <- melt(DT, id.vars = c('town','tc'))
DT3[, `:=` (mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)
][, outlier := +(value < mn - sdev | value > mn + sdev)]

这使:
     town tc variable      value         mn     sdev outlier
1: A C one 0.5681578 0.2981842 0.355652 0
2: A D one 0.5528128 1.2099018 1.020525 0
3: A C one 0.5214274 0.2981842 0.355652 0
4: A D one 1.4171454 1.2099018 1.020525 0
5: A C one 0.5820994 0.2981842 0.355652 0
---
156: B D total 23.4462542 27.8820524 3.807119 1
157: B C total 30.5934956 28.9524305 1.078307 1
158: B D total 30.5618759 27.8820524 3.807119 0
159: B C total 27.5940307 28.9524305 1.078307 1
160: B D total 24.8378437 27.8820524 3.807119 0

关于r - 如何更改数据表中的几个列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37142181/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com