gpt4 book ai didi

r - 改进我的 R 代码以计算数据框的 Z 分数

转载 作者:行者123 更新时间:2023-12-03 16:29:50 25 4
gpt4 key购买 nike

我的任务是生成用于计算 Z 分数的 R 代码,然后将其输出到文件。虽然脚本有效,但我对一些令我感到困惑的行有一些疑问。
输入.txt:

GeneID  GeneID-2    GeneName    TSS-ID  Locus-ID    Sample1 Sample2 Sample3 Sample4 Sample5ID1 X1  Zranb2  TSS1    Loc1    22.49161667 14.7231 19.62885833 26.16171667 39.3109ID2 X2  Lphn2   TSS2    Loc2    6.439735    5.920786667 8.883331667 7.696353333 10.46969333ID3 X3  Rpf1    TSS3    Loc3    30.67975    20.93751667 27.30251667 31.55653333 58.57418333ID4 X4  Ctbs    TSS4    Loc4    1.916071667 1.943611667 2.696701667 3.130295    2.74012ID5 X5  Spata1  TSS5    Loc5    0.715265667 0.3318745   0.4183155   0.961065833 1.10731ID6 X6  Sap30bp TSS6    Loc6    21.65946667 23.84386667 28.39683333 25.32866667 26.96016667ID7 X7  Recql5  TSS7    Loc7    7.541321667 4.674345    4.40599 3.24996 3.327395ID8 X8  Itgb4   TSS8    Loc8    37.3442 51.58868333 51.58868333 44.84458333 42.44406667

I would like to generate Z-score for data columns starting at Sample 1 to end of columns.Here is the R Code I wrote:

df <- read.table("Input.txt", row.names=1, header=TRUE, sep="\t", na.strings="NA")
x<-df[,5:ncol(df)] #selects the columns after column 5, so just the data
p<-matrix(0, now(x), ncol(x)) #opens matrix. First issue: I used "0" as I saw other people on forums doing that, but i dont know its significance.It worked for me, so i kept it. Can anyone comment on this?
#Create a loop for row and columns
for (i in 1:nrow(x)) {
for (j in 1:ncol(x)) {
p[i,j] <- (x[i,j]-rowMeans(x[i,]))/sd(x[i,])
}
}

上述脚本成功生成了矩阵。有没有办法优化它或者这是一个合理的方法?我的庞大数据集有点慢,但它完成了工作。

输出文件时,我的标题发生了变化。我的目标是将 df 的第一列作为行名输出,然后将每个样本名称作为标题输出。为此,我使用了:
rownames(p) <-rownames(df)
colnames(p) <- colnames(df[,5:ncol(df)])
write.table(p, file = "Zscore.txt", append = FALSE, quote = FALSE, sep = "\t", row.names = TRUE, col.names = TRUE)

输出文件如下所示:

sample 1 sample 2 sample 3 sample 4 sample 5
ID1 -0.212153637 -1.048074183 -0.520196808 0.182762424 1.597662204
ID2 -0.780453984 -1.061276795 0.541869723 -0.100449696 1.400310753
ID3 -0.216506298 -0.890314297 -0.450087937 -0.1558648 1.712773332
ID4 -1.064932662 -1.013415279 0.395343854 1.206440228 0.476563859
ID5 0.02537058 -1.119050742 -0.861024653 0.759083238 1.195621576
ID6 -1.35974252 -0.52968526 1.200411349 0.03452872 0.654487711

标题向左移动。另外,如果我想在输出文件中包含 df 的所有前 5 列,该怎么做?

最后,您能否建议我上面的Zscore方法计算与其他问题中讨论的比例函数之间的差异?

最佳答案

我们可以使用 rowSds来自 matrixStats并一步完成计算。

library(matrixStats)
dfN <- df[6:ncol(df)]
(dfN-rowMeans(dfN))/(rowSds(as.matrix(dfN)))[row(dfN)]
# Sample1 Sample2 Sample3 Sample4 Sample5
#1 -0.21215364 -1.04807418 -0.5201968 0.18276242 1.5976622
#2 -0.78045398 -1.06127680 0.5418697 -0.10044970 1.4003108
#3 -0.21650630 -0.89031430 -0.4500879 -0.15586480 1.7127733
#4 -1.06493266 -1.01341528 0.3953439 1.20644023 0.4765639
#5 0.02537058 -1.11905074 -0.8610247 0.75908324 1.1956216
#6 -1.35974252 -0.52968526 1.2004113 0.03452872 0.6544877
#7 1.66627789 0.01983708 -0.1342732 -0.79815548 -0.7536863
#8 -1.34013679 0.98280311 0.9828031 -0.11700084 -0.5084686

base R唯一的方法是
res <- t(scale(t(dfN)))
attributes(res)[3:4] <- NULL

关于r - 改进我的 R 代码以计算数据框的 Z 分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34707527/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com