作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
给定 8 个样本 (A1-A8) 的以下数据:
A1 A2 A3 A4 A5 A6 A7 A8
T T T T T T T C
T C T T T T T C
A A A G G A A A
其中每一列是一个样本,每一行是一个可能编码为 A、C、G、T 的标记,我希望计算每一行 4 个等位基因中任何一个的起源的概率。例如,上述第 1 行数据的输出应该是
A C G T
A1 0 0 0 1/7
A2 0 0 0 1/7
A3 0 0 0 1/7
A4 0 0 0 1/7
A5 0 0 0 1/7
A6 0 0 0 1/7
A7 0 0 0 1/7
A8 0 1 0 0
由于第 1 行中有 7 个样本具有 T,因此每个样本的概率为 1/7。由于只有 A8 拥有 C,因此将 C 分配给 A8 的概率为 100%。对于第 3 行,输出应该是
A C G T
A1 1/6 0 0 0
A2 1/6 0 0 0
A3 1/6 0 0 0
A4 1/2 0 0 0
A5 1/2 0 0 0
A6 1/6 0 0 0
A7 1/6 0 0 0
A8 1/6 0 0 0
总输出应该是 i 个 8x4 矩阵的列表,其中 i 等于行数。
一个可重写的代码示例是:
states <- c("A1","A2","A3","A4","A5","A6","A7","A8") # Define the names of the states
A1 <- c("T","T","A") # Set the alleles for state A1 across 3 SNPs
A2 <- c("T","C","A") # Set the alleles for state A2 across 3 SNPs
A3 <- c("T","T","A") # Set the alleles for state A3 across 3 SNPs
A4 <- c("T","T","G") # Set the alleles for state A4 across 3 SNPs
A5 <- c("T","T","G") # Set the alleles for state A5 across 3 SNPs
A6 <- c("T","T","A") # Set the alleles for state A6 across 3 SNPs
A7 <- c("T","T","A") # Set the alleles for state A7 across 3 SNPs
A8 <- c("C","C","A") # Set the alleles for state A8 across 3 SNPs
theemissionmatrix <- matrix(t(c(A1,A2,A3,A4,A5,A6,A7,A8)), 8, 3, byrow = TRUE) # Create an 8 x 3 matrix
rownames(theemissionmatrix) <- states
theemissionmatrix # Print out the data matrix
[,1] [,2] [,3]
A1 "T" "T" "A"
A2 "T" "C" "A"
A3 "T" "T" "A"
A4 "T" "T" "G"
A5 "T" "T" "G"
A6 "T" "T" "A"
A7 "T" "T" "A"
A8 "C" "C" "A"
test <- cbind(theemissionmatrix[,1]=="A",theemissionmatrix[,1]=="C",theemissionmatrix[,1]=="G",theemissionmatrix[,1]=="T")
colnames(test) <- c("A","C","G","T")
test
[,1] [,2] [,3] [,4]
A1 FALSE FALSE FALSE TRUE
A2 FALSE FALSE FALSE TRUE
A3 FALSE FALSE FALSE TRUE
A4 FALSE FALSE FALSE TRUE
A5 FALSE FALSE FALSE TRUE
A6 FALSE FALSE FALSE TRUE
A7 FALSE FALSE FALSE TRUE
A8 FALSE TRUE FALSE FALSE
经过这一步,我不确定如何对每列的总计数求和并除以得到总概率。
最佳答案
这里有一些备选方案(从@zx8754 的回答中以“df”开头):
melt
+ table
+ prop.table
当存在 NaN
值时,显示效果不佳。
library(reshape2)
dfL <- melt(as.matrix(df))
Levs <- c("A", "C", "G", "T")
dfL$value <- factor(dfL$value, Levs) ## Just to be sure
prop.table(table(dfL[c(2, 3, 1)]), c(2, 3))
# , , Var1 = 1
#
# value
# Var2 A C G T
# A1 0.0000000 0.1428571
# A2 0.0000000 0.1428571
# A3 0.0000000 0.1428571
# A4 0.0000000 0.1428571
# A5 0.0000000 0.1428571
# A6 0.0000000 0.1428571
# A7 0.0000000 0.1428571
# A8 1.0000000 0.0000000
#
# , , Var1 = 2
#
# value
# Var2 A C G T
# A1 0.0000000 0.1666667
# A2 0.5000000 0.0000000
# ..... OUTPUT TRUNCATED
melt
+ by
+ table
可以很容易地显示 0
,否则 NaN
会显示。
dfL <- melt(as.matrix(df))
Levs <- c("A", "C", "G", "T")
dfL$value <- factor(dfL$value, Levs) ## Just to be sure
by(dfL[-1], dfL[1], FUN = function(x) {
A <- prop.table(table(x), 2)
A[is.nan(A)] <- 0
A
})
# Var1: 1
# value
# Var2 A C G T
# A1 0.0000000 0.0000000 0.0000000 0.1428571
# A2 0.0000000 0.0000000 0.0000000 0.1428571
# A3 0.0000000 0.0000000 0.0000000 0.1428571
# A4 0.0000000 0.0000000 0.0000000 0.1428571
# A5 0.0000000 0.0000000 0.0000000 0.1428571
# A6 0.0000000 0.0000000 0.0000000 0.1428571
# A7 0.0000000 0.0000000 0.0000000 0.1428571
# A8 0.0000000 1.0000000 0.0000000 0.0000000
# ------------------------------------------------------------------------
# Var1: 2
# value
# Var2 A C G T
# A1 0.0000000 0.0000000 0.0000000 0.1666667
# A2 0.0000000 0.5000000 0.0000000 0.0000000
# ..... OUTPUT TRUNCATED
lapply
+ table
在对数据进行一些重组之后完全坚持基础 R,这是另一种选择....
Levs <- c("A", "C", "G", "T")
out <- data.frame(N = names(df), t(df), row.names=NULL)
Rows <- setdiff(names(out), "N")
out[Rows] <- lapply(out[Rows], function(x) factor(x, Levs))
Tables <- lapply(seq_along(Rows), function(x) {
A <- prop.table(table(out[, 1], out[, Rows[x]]), 2)
A[is.nan(A)] <- 0
A
})
关于r - 使用 SNP 等位基因数据为 ACGT 创建概率矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20052057/
给定 8 个样本 (A1-A8) 的以下数据: A1 A2 A3 A4 A5 A6 A7 A8 T T T T T T T C T C T T T T T C A A A G G A A A 其中每一
我正在查看 Qt 特定的 C++ solution对于典型的生产者/消费者问题。这是生产者的代码: class Producer : public QThread { public: void
我是一名优秀的程序员,十分优秀!