gpt4 book ai didi

r - 分层(分类)数据到树状图

转载 作者:行者123 更新时间:2023-12-03 15:27:40 27 4
gpt4 key购买 nike

数据
我有以下(简化的)数据集,我们称之为 df从现在开始:

                     species    rank                   value
1 Pseudomonas putida family Pseudomonadaceae
2 Pseudomonas aeruginosa family Pseudomonadaceae
3 Enterobacter xiangfangensis family Enterobacteriaceae
4 Salmonella enterica family Enterobacteriaceae
5 Klebsiella pneumoniae family Enterobacteriaceae
6 Pseudomonas putida genus Pseudomonas
7 Pseudomonas aeruginosa genus Pseudomonas
8 Enterobacter xiangfangensis genus Enterobacter
9 Salmonella enterica genus Salmonella
10 Klebsiella pneumoniae genus Klebsiella
11 Pseudomonas putida species Pseudomonas putida
12 Pseudomonas aeruginosa species Pseudomonas aeruginosa
13 Enterobacter xiangfangensis species Enterobacter hormaechei
14 Salmonella enterica species Salmonella enterica
15 Klebsiella pneumoniae species Klebsiella pneumoniae

我想达到什么

这个数据是 taxonomy data显示 species分类,其中 rank是科>属>种的顺序。由于分层性质,我想将其显示为树,优先在 ggplot2像这样:
enter image description here

我试过的
当我找到一个包裹时, taxize使用 class2tree 将此(实际上是完整的分类 - 此处仅显示部分)转换为树。 :
class.dat <- classification(c("Pseudomonas putida", "Pseudomonas aeruginosa","Enterobacter xiangfangensis","Salmonella enterica","Klebsiella pneumoniae"), db = 'ncbi')
taxize::class2tree(class.dat)

这不像我的手工制作的图表那样显示等级,我在我的可视化中需要:

enter image description here

编辑:数据输出
structure(list(species = c("Pseudomonas putida", "Pseudomonas putida", 
"Pseudomonas putida", "Pseudomonas aeruginosa", "Pseudomonas aeruginosa",
"Pseudomonas aeruginosa", "Enterobacter xiangfangensis", "Enterobacter xiangfangensis",
"Enterobacter xiangfangensis", "Salmonella enterica", "Salmonella enterica",
"Salmonella enterica", "Klebsiella pneumoniae", "Klebsiella pneumoniae",
"Klebsiella pneumoniae"), rank = c("family", "genus", "species",
"family", "genus", "species", "family", "genus", "species", "family",
"genus", "species", "family", "genus", "species"), value = c("Pseudomonadaceae",
"Pseudomonas", "Pseudomonas putida", "Pseudomonadaceae", "Pseudomonas",
"Pseudomonas aeruginosa", "Enterobacteriaceae", "Enterobacter",
"Enterobacter hormaechei", "Enterobacteriaceae", "Salmonella",
"Salmonella enterica", "Enterobacteriaceae", "Klebsiella", "Klebsiella pneumoniae"
)), row.names = c(NA, -15L), class = "data.frame", .Names = c("species",
"rank", "value"))

编辑:回复@StupidWolf
我能够将 class.data 转换为数据帧,然后转换为父子数据帧以将其用作 ggraph 的输入.唯一剩下的就是拥有 xlabel,在这种情况下是 interest向量。但是我不确定在 ggraph 中是否可行:
# Retreive data
class.dat <- classification(c("Pseudomonas putida", "Pseudomonas aeruginosa","Enterobacter xiangfangensis","Salmonella enterica","Klebsiella pneumoniae"), db = 'ncbi')

# Specify interest
interest <- c('superkingdom', 'phylum','class','order','genus','species')

# Convert to wide matrix
df2 <- bind_rows(class.dat, .id = "column_label") %>%
dplyr::select(-id) %>%
filter(rank %in% interest) %>%
spread(rank, name) %>%
dplyr::select(-column_label) %>%
dplyr::select(interest) %>% # we need the order
as.matrix()

# Empty parent child matrix
parent.child <- matrix(nrow=0,ncol=2)

# Add data to parent child
for (i in 1:(ncol(df2)-1)){
parent.child <- rbind(parent.child,df2[,c(i,i+1)])
}

# To dataframe and add colnmaes
parent.child <- as.data.frame(parent.child)
colnames(parent.child) <- c('from', 'to')

# Convert this to a ggraph
g <- graph_from_data_frame(parent.child)
ggraph(g,layout='dendrogram',circular=FALSE) +
geom_edge_link() +
geom_node_label(aes(label=names(V(g))),size=3,nudge_y=-0.1) +
scale_y_reverse(labels = interest) + coord_flip() +
theme_classic()

最佳答案

然后我们创建一个 hierarchical bundling

d1 = data.frame(from="origin",to=c("Pseudomonadaceae","Enterobacteriaceae"))
d2 = data.frame(from=c("Pseudomonadaceae","Pseudomonadaceae","Enterobacteriaceae","Enterobacteriaceae","Enterobacteriaceae"),to=c("Pseudomonas","Pseudomonas","Enterobacter","Salmonella","Klebsiella"))
d3 = data.frame(from=c("Pseudomonas","Pseudomonas","Enterobacter","Salmonella","Klebsiella"),to=c("Pseudomonas putida","Pseudomonas aeruginosa","Enterobacter hormaechei","Salmonella enterica","Klebsiella pneumoniae"))

hierarchy <- rbind(d1, d2,d3)

vertices <- data.frame(name = unique(c(as.character(hierarchy$from), as.character(hierarchy$to))) )

然后我们要么使用 igraph 绘制它们:
g <- graph_from_data_frame( hierarchy, vertices=vertices )
lay = layout.reingold.tilford(g)
par(mar=c(0,0,0,0))
plot(g, layout=-lay[, 2:1],vertex.label.cex=0.7,
vertex.size=1,edge.arrow.size= 0.4)

enter image description here

或者在 ggraph 中是这样的:
library(ggraph)
ggraph(g,layout='dendrogram',circular=FALSE) +
geom_edge_link() +
geom_node_label(aes(label=names(V(g))),size=2,nudge_y=-0.1) +
scale_y_reverse() + coord_flip() + theme_void()

enter image description here

关于r - 分层(分类)数据到树状图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60904143/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com