gpt4 book ai didi

mysql - Perl(或 R,或 SQL): Count how often string appears across columns

转载 作者:可可西里 更新时间:2023-11-01 06:27:57 27 4
gpt4 key购买 nike

我有一个如下所示的文本文件:

gene1   gene2   gene3
a d c
b e d
c f g
d g
h
i

(每一列都是一个人类基因,每个都包含数量可变的蛋白质(字符串,此处显示为字母),可以与这些基因结合)。

我想做的是计算每个字符串在多少列中表示,输出该数字和所有列标题,如下所示:

a   1   gene1
b 1 gene1
c 2 gene1 gene3
d 3 gene1 gene2 gene3
e 1 gene2
f 1 gene2
g 2 gene2 gene3
h 1 gene2
i 1 gene2

我一直在尝试弄清楚如何在 Perl 和 R 中做到这一点,但到目前为止还没有成功。感谢您的帮助。

最佳答案

这个解决方案看起来有点 hack,但它提供了所需的输出。它依赖于同时使用 plyrreshape 包,但我相信您可以找到基于 R 的替代方案。诀窍是 melt 函数让我们将数据扁平化为长格式,从那时起就可以轻松(大概)进行操作。

library(reshape)
library(plyr)

#Recreate your data
dat <- data.frame(gene1 = c(letters[1:4], NA, NA),
gene2 = letters[4:9],
gene3 = c("c", "d", "g", NA, NA, NA)
)

#Melt the data. You'll need to update this if you have more columns
dat.m <- melt(dat, measure.vars = 1:3)

#Tabulate counts
counts <- as.data.frame(table(dat.m$value))

#I'm not sure what to call this column since it's a smooshing of column names
otherColumn <- ddply(dat.m, "value", function(x) paste(x$variable, collapse = " "))

#Merge the two together. You could fix the column names above, or just deal with it here
merge(counts, otherColumn, by.x = "Var1", by.y = "value")

给予:

> merge(counts, otherColumn, by.x = "Var1", by.y = "value")
Var1 Freq V1
1 a 1 gene1
2 b 1 gene1
3 c 2 gene1 gene3
4 d 3 gene1 gene2 gene3
....

关于mysql - Perl(或 R,或 SQL): Count how often string appears across columns,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6935471/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com