gpt4 book ai didi

r - 计算数据框中连续行对之间的差异 - R

转载 作者:行者123 更新时间:2023-12-04 17:53:13 24 4
gpt4 key购买 nike

我有一个 data.frame,其中每个基因名称都重复并包含 2 个条件的值:

df <- data.frame(gene=c("A","A","B","B","C","C"),
condition=c("control","treatment","control","treatment","control","treatment"),
count=c(10, 2, 5, 8, 5, 1),
sd=c(1, 0.2, 0.1, 2, 0.8, 0.1))

gene condition count sd
1 A control 10 1.0
2 A treatment 2 0.2
3 B control 5 0.1
4 B treatment 8 2.0
5 C control 5 0.8
6 C treatment 1 0.1

我想计算治疗后“计数”是否增加或减少,并将它们标记为这样和/或对它们进行子集化。即(伪代码):
for each unique(gene) do 
if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up"
else gene is "down"

这最终应该是什么样子(最后一列是可选的):
up-regulated
gene condition count sd regulation
B control 5 0.1 up
B treatment 8 2.0 up

down-regulated
gene condition count sd regulation
A control 10 1.0 down
A treatment 2 0.2 down
C control 5 0.8 down
C treatment 1 0.1 down

我一直在思考这个问题,包括玩 ddply,但我没有找到解决方案 - 请一位不幸的生物学家。

干杯。

最佳答案

plyr解决方案看起来像:

library(plyr)
reg.fun <- function(x) {
reg.diff <- x$count[x$condition=='control'] - x$count[x$condition=='treatment']
x$regulation <- ifelse(reg.diff > 0, 'up', 'down')

x
}

ddply(df, .(gene), reg.fun)


gene condition count sd regulation
1 A control 10 1.0 up
2 A treatment 2 0.2 up
3 B control 5 0.1 down
4 B treatment 8 2.0 down
5 C control 5 0.8 up
6 C treatment 1 0.1 up
>

您还可以考虑使用不同的包和/或不同形状的数据来执行此操作:
df.w <- reshape(df, direction='wide', idvar='gene', timevar='condition')

library(data.table)
DT <- data.table(df.w, key='gene')

DT[, regulation:=ifelse(count.control-count.treatment > 0, 'up', 'down'), by=gene]

gene count.control sd.control count.treatment sd.treatment regulation
1: A 10 1.0 2 0.2 up
2: B 5 0.1 8 2.0 down
3: C 5 0.8 1 0.1 up
>

关于r - 计算数据框中连续行对之间的差异 - R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12539248/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com