gpt4 book ai didi

r - 如何扩展 heatmap.2 中的树状图

转载 作者:行者123 更新时间:2023-12-02 02:43:16 24 4
gpt4 key购买 nike

我有以下带有树状图的热图。

完整数据为here .

问题在于左侧的树状图被压扁了。如何在不改变热图列大小的情况下展开(展开)它?

enter image description here

它是用以下代码生成的:

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

# Read data
dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")]
dat.bcd <- dat.bcd

# Clustering and distance function
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")


# Select based on FC, as long as any of them >= anylim

anylim <- 2.0
dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


# Clustering functions
height <- clust.height;

# Define output file name
heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


# Compute distance and clusteirn function
d.bcd <- distfunc(dat.bcd)
fit.bcd <- hclustfunc(d.bcd)


# Cluster by height
#cutree and rect.huclust has to be used in tandem
clusters <- cutree(fit.bcd, h=height)
nofclust.height <- length(unique(as.vector(clusters)));

myorder <- colnames(dat.bcd);
if (length(type.order)>0) {
myorder <- type.order
}

# Define colors
#hmcols <- rev(brewer.pal(11,"Spectral"));
hmcols <- rev(redgreen(2750));
selcol <- colorRampPalette(brewer.pal(12,"Set3"))
selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
sdcol= selcol(5);
clustcol.height = selcol2(nofclust.height);

# Plot heatmap
pdf(file=heatout,width=20,height=50); # for FC.lim >=2
heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
dev.off();


}
#--------------------------------------------------
# ENd of functions
#--------------------------------------------------

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);

最佳答案

在您的情况下,数据具有长尾,这对于基因表达数据(对数正态)来说是预期的。

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

从分位数分布可以看出,表达量最高的基因的范围从 1.5 扩展到 300 以上。

quantile(mat)

# 0% 25% 50% 75% 100%
# 0.000 0.769 1.079 1.544 346.230

当对未缩放数据执行分层聚类时,生成的树状图可能会显示出对具有最高表达的值的偏差,如您的示例所示。这值得进行对数或 z 分数转换( reference )。您的数据集包含 values == 0,这是日志转换的问题,因为 log(0) 未定义。

Z 分数变换 ( reference ) 在 heatmap.2 中实现,但需要注意的是,该函数在计算距离矩阵并运行聚类算法之前缩放数据。因此,选项 scale='row' 不会影响聚类结果,请参阅我之前的文章 ( differences in heatmap/clustering defaults in R ) 了解更多详细信息。

我建议您运行heatmap.2之前缩放数据:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat)))

quantile(z)

# 0% 25% 50% 75% 100%
# -2.1843994 -0.6646909 -0.2239677 0.3440102 2.2640027

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5)

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none',
hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
RowSideColors=as.character(clusters))
dev.off()

另外,请参阅其他帖子 herehere ,其中解释了如何通过 lmatlwidlhei 参数设置热图布局。

生成的热图如下所示(省略行和列标签):

enter image description here

关于r - 如何扩展 heatmap.2 中的树状图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21983162/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com