gpt4 book ai didi

performance - 提高可视化重叠段的性能

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:58:48 26 4
gpt4 key购买 nike

我有一组 x 点对来沿 x 轴绘制线段以在 R 中创建自定义读取图:

example read map

绘制这些线段的一半任务是确定它们的 y 位置,以便没有两个重叠的线段位于同一 y 水平。对于每个段,我从第一个位置开始迭代 y 个级别,直到我到达一个位置,该位置还不包含将与当前段重叠的段。然后我记录当前段的结束位置并移动到下一个。

实际代码是一个函数如下:

# Dummy data
# A list of start and end positions for each segment along the X axis. Sorted by start.
# Passing the function few.reads draws a map in half a second. Passing it many.reads takes about half an hour to complete.
few.reads <- data.frame( start=c(rep(10,150), rep(16,100), rep(43,50)), end=c(rep(30,150), rep(34,100), rep(57,50)) );
many.reads <- data.frame( start=c(rep(10,15000), rep(16,10000), rep(43,5000)), end=c(rep(30,15000), rep(34,10000), rep(57,5000)) );

#---
# A function to draw a series of overlapping segments (or "reads" in my along
# The x-axis. Where reads overlap, they are "stacked" down the y axis
#---
drawReads <- function(reads){

# sort the reads by their start positions
reads <- reads[order(reads$start),];

# minimum and maximum for x axis
minstart <- min(reads$start);
maxend <- max(reads$end);

# initialise yread: a list to keep track of used y levels
yread <- c(minstart - 1);
ypos <- c(); #holds the y position of the ith segment

#---
# This iteration step is the bottleneck. Worst case, when all reads are stacked on top
# of each other, it has to iterate over many y levels to find the correct position for
# the later reads
#---
# iterate over segments
for (r in 1:nrow(reads)){
read <- reads[r,];
start <- read$start;
placed <- FALSE;

# iterate through yread to find the next availible
# y pos at this x pos (start)
y <- 1;
while(!placed){

if(yread[y] < start){
ypos[r] <- y;
yread[y] <- read$end;
placed <- TRUE;
}

# current y pos is used by another segment, increment
y <- y + 1;
# initialize another y pos if we're at the end of the list
if(y > length(yread)){
yread[y] <- minstart-1;
}
}
}

#---
# This is the plotting step
# Once we are here the rest of the process is very quick
#---
# find the maximum y pos that is used to size up the plot
maxy <- length(yread);
miny = 1;


reads$ypos <- ypos + miny;

print("New Plot...")
# Now we have all the information, start the plot
plot.new();
plot.window(xlim=c(minstart, maxend+((maxend-minstart)/10)), ylim=c(1,maxy));

axis(3,xaxp=c(minstart,maxend,(maxend-minstart)/10));
axis(2, yaxp=c(miny,maxy,3),tick=FALSE,labels=FALSE);

print("Draw the reads...");
maxy <- max(reads$ypos);
segments(reads$start, maxy-reads$ypos, reads$end, maxy-reads$ypos, col="blue");
}

我的实际数据集非常大,据我所知包含的区域可以有多达 600000 次读取。读取自然会相互堆叠,因此很容易实现所有读取相互重叠的最坏情况。绘制大量读数所花费的时间对我来说是无法接受的,因此我正在寻找一种方法来提高该过程的效率。我可以用更快的东西替换我的循环吗?有没有一种算法可以更快地安排读取?目前我真的想不出更好的方法。

感谢您的帮助。

最佳答案

以贪婪的方式填充每个 y 级别。一个级别被填满后,向下一个级别并且永远不会返回。

伪代码:

 y <- 1
while segment-list.not-empty
i <- 1
current <- segment-list[i]
current.plot(y)
segment-list.remove(i)
i <- segment-list.find_first_greater(current.end)
while (i > 0)
current <- segment-list[i]
current.plot(y)
segment-list.remove(i)
y <- y + 1

这不一定会产生任何意义上的“最佳”图,但至少它是 O(n log n)。

关于performance - 提高可视化重叠段的性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9871043/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com