绘制大型时间序列-6ren

绘制大型时间序列

转载作者：行者123 更新时间：2023-12-04 19:28:26

问题摘要 :

是否有任何易于实现的算法来减少表示时间序列所需的点数而不改变它在图中的显示方式？

激励问题 :

我正在尝试以交互方式可视化从嵌入式系统以 ~20 kHz 记录的 10 到 15 个 channel 的数据。日志可以覆盖超过一个小时的时间，这意味着我正在处理 1e8 和 1e9 点之间。此外，我关心持续很短时间(即小于 1 毫秒)的潜在小异常，因此简单的抽取不是一种选择。

毫不奇怪，如果您做一些幼稚的事情并尝试将比专用 GPU 内存更大的数据数组交给它们，大多数绘图库会感到有点难过。在我的系统上，它实际上比这更糟；使用随机浮点数向量作为测试用例，在刷新率低于 1 FPS 之前，我只从库存的 Matlab 绘图函数和 Python + matplotlib 中获得了大约 5e7 点。

现有问题和解决方案:

这个问题有点类似于一些现有的问题，例如:

How to plot large data vectors accurately at all zoom levels in real time?

How to plot large time series (thousands of administration times/doses of a medication)?

[几个交叉验证的问题]

但处理更大的数据集和/或以交互性为代价对保真度更加严格(获得 60 FPS 柔滑平滑的平移和缩放会很棒，但实际上，我会对 1 FPS 感到满意)。

显然，需要某种形式的数据缩减。在搜索解决我的问题的现有工具时，我发现了两种范式:

抽取但跟踪异常值:一个很好的例子是 Matlab + dsplot (即我上面链接的第一个问题的公认答案中建议的工具)。 dsplot 减少到固定数量的均匀间隔的点，然后添加回使用高通 FIR 滤波器的标准偏差识别的异常值。虽然这对于几类数据来说可能是一个可行的解决方案，但如果存在超过滤波器截止频率的大量频率内容并且可能需要调谐，则它可能会遇到困难。

绘制最小值和最大值:使用这种方法，您将时间序列划分为对应于每个水平像素的间隔，并仅绘制每个间隔中的最小值和最大值。 Matlab + Plot (Big)是一个很好的例子，但使用 O(n) 的 min 和 max 计算使它在到达 1e8 或 1e9 点时有点慢。 mex 函数或 python 中的二叉搜索树可以解决这个问题，但实现起来很复杂。

有没有更简单的解决方案可以满足我的要求？

编辑 (2018-02-18):问题重构为专注于算法而不是实现算法的工具。

最佳答案

我在显示数百个传感器的压力时间序列时遇到了同样的问题，几年来每分钟都有样本。在某些情况下(例如清理数据时)，我想查看所有异常值，而在其他情况下，我对趋势更感兴趣。所以我写了一个函数，可以使用两种方法减少数据点的数量:visvalingam 和 Douglas-Peucker。第一个倾向于删除异常值，第二个保留它们。我已经优化了该函数以处理大型数据集。
在意识到所有绘图方法都无法处理那么多点之后，我这样做了，而那些能够处理的方法正在以我无法控制的方式抽取数据集。函数如下:

function [X, Y, indices, relevance] = lineSimplificationI(X,Y,N,method,option)
%lineSimplification Reduce the number of points of the line described by X
%and Y to N. Preserving the most relevant ones.
%   Using an adapted method of visvalingam and Douglas-Peucker algorithms.
%   The number of points of the line is reduced iteratively until reaching
%   N non-NaN points. Repeated NaN points in original data are deleted but
%   non-repeated NaNs are preserved to keep line breaks.
%   The two available methods are
%
%   Visvalingam: The relevance of a point is proportional to the area of
%   the triangle defined by the point and its two neighbors.
%   
%   Douglas-Peucker: The relevance of a point is proportional to the
%   distance between it and the straight line defined by its two neighbors.
%   Note that the implementation here is iterative but NOT recursive as in 
%   the original algorithm. This allows to better handle large data sets.
%
%   DIFFERENCES: Visvalingam tend to remove outliers while Douglas-Peucker
%   keeps them.
%
%   INPUTS:
%         X: X coordinates of the line points
%         Y: Y coordinates of the line points
%    method: Either 'Visvalingam' or 'DouglasPeucker' (default)
%    option: Either 'silent' (default) or 'verbose' if additional outputs
%            of the calculations are desired.
%
% OUTPUTS:
%         X: X coordinates of the simplified line points
%         Y: Y coordinates of the simplified line points
%   indices: Indices to the positions of the points preserved in the
%            original X and Y. Therefore Output X is equal to the input
%            X(indices).
% relevance: Relevance of the returned points. It can be used to furder
%            simplify the line dinamically by keeping only points with 
%            higher relevance. But this will produce bigger distortions of 
%            the line shape than calling again lineSimplification with a 
%            smaller value for N, as removing a point changes the relevance
%            of its neighbors.
%
% Implementation by Camilo Rada - camilo@rada.cl
%

    if nargin < 3
        error('Line points positions X, Y and target point count N MUST be specified');
    end
    if nargin < 4
        method='DouglasPeucker';
    end
    if nargin < 5
        option='silent';
    end

    doDisplay=strcmp(option,'verbose');

    X=double(X(:));
    Y=double(Y(:));
    indices=1:length(Y);

    if length(X)~=length(Y)
        error('Vectors X and Y MUST have the same number of elements');
    end

    if N>=length(Y)
        relevance=ones(length(Y),1);
        if doDisplay
            disp('N is greater or equal than the number of points in the line. Original X,Y were returned. Relevances were not computed.')
        end
        return
    end
    % Removing repeated NaN from Y
    % We find all the NaNs with another NaN to the left
    repeatedNaNs= isnan(Y(2:end)) & isnan(Y(1:end-1));
    %We also consider a repeated NaN the first element if NaN
    repeatedNaNs=[isnan(Y(1)); repeatedNaNs(:)];
    Y=Y(~repeatedNaNs);
    X=X(~repeatedNaNs);
    indices=indices(~repeatedNaNs);

    %Removing trailing NaN if any
    if isnan(Y(end))
        Y=Y(1:end-1);
        X=X(1:end-1);
        indices=indices(1:end-1);
    end

    pCount=length(X);

    if doDisplay
        disp(['Initial point count = ' num2str(pCount)])
        disp(['Non repeated NaN count in data = ' num2str(sum(isnan(Y)))])
    end

    iterCount=0;

    while pCount>N
        iterCount=iterCount+1;
        % If the vertices of a triangle are at the points (x1,y1) , (x2, y2) and
        % (x3,y3) the are uf such triangle is
        % area = abs((x1*(y2-y3)+x2*(y3-y1)+x3*(y1-y2))/2)
        % now the areas of the triangles defined by each point of X,Y and its two
        % neighbors are

        twiceTriangleArea =abs((X(1:end-2).*(Y(2:end-1)-Y(3:end))+X(2:end-1).*(Y(3:end)-Y(1:end-2))+X(3:end).*(Y(1:end-2)-Y(2:end-1))));

        switch method
            case 'Visvalingam'
                % In this case the relevance is given by the area of the
                % triangle formed by each point end the two points besides
                relevance=twiceTriangleArea/2;
            case 'DouglasPeucker'
                % In this case the relevance is given by the minimum distance
                % from the point to the line formed by its two neighbors
                neighborDistances=ppDistance([X(1:end-2) Y(1:end-2)],[X(3:end) Y(3:end)]);
                relevance=twiceTriangleArea./neighborDistances;
            otherwise
                error(['Unknown method: ' method]);
        end
        relevance=[Inf; relevance; Inf];
        %We remove the pCount-N least relevant points as long as they are not contiguous

        [srelevance, sortorder]= sort(relevance,'descend');
        firstFinite=find(isfinite(srelevance),1,'first');
        startPos=uint32(firstFinite+N+1);
        toRemove=sort(sortorder(startPos:end));
        if isempty(toRemove)
            break;
        end

        %Now we have to deal with contigous elements, as removing one will
        %change the relevance of the neighbors. Therefore we have to
        %identify pairs of contigous points and only remove the one with
        %leeser relevance

        %Contigous will be true for an element if the next or the previous
        %element is also flagged for removal
        contiguousToKeep=[diff(toRemove(:))==1; false] | [false; (toRemove(1:end-1)-toRemove(2:end))==-1];
        notContiguous=~contiguousToKeep;

        %And the relevances asoociated to the elements flagged for removal
        contRel=relevance(toRemove);

        % Now we rearrange contigous so it is sorted in two rows, therefore
        % if both rows are true in a given column, we have a case of two
        % contigous points that are both flagged for removal
        % this process is demenden of the rearrangement, as contigous
        % elements can end up in different colums, so it has to be done
        % twice to make sure no contigous elements are removed
         nContiguous=length(contiguousToKeep);

        for paddingMode=1:2
            %The rearragngement is only possible if we have an even number of
            %elements, so we add one dummy zero at the end if needed
            if paddingMode==1
                if mod(nContiguous,2)
                    pcontiguous=[contiguousToKeep; false];
                    pcontRel=[contRel; -Inf];
                else
                    pcontiguous=contiguousToKeep;
                    pcontRel=contRel;
                end
            else
                if mod(nContiguous,2)
                    pcontiguous=[false; contiguousToKeep];
                    pcontRel=[-Inf; contRel];
                else
                    pcontiguous=[false; contiguousToKeep(1:end-1)];
                    pcontRel=[-Inf; contRel(1:end-1)];                    
                end
            end

            contiguousPairs=reshape(pcontiguous,2,[]);
            pcontRel=reshape(pcontRel,2,[]);

            %finding colums with contigous element
            contCols=all(contiguousPairs);
            if ~any(contCols) && paddingMode==2
                break;
            end
            %finding the row of the least relevant element of each column
            [~, lesserElementRow]=max(pcontRel);

            %The index in contigous of the first element of each pair is
            if paddingMode==1
                firstElementIdx=((1:size(contiguousPairs,2))*2)-1;
            else
                firstElementIdx=((1:size(contiguousPairs,2))*2)-2;
            end            

            % and the index in contigous of the most relevant element of each
            % pair is
            lesserElementIdx=firstElementIdx+lesserElementRow-1;

            %now we set the least relevant element as NOT continous, so it is
            %removed
            contiguousToKeep(lesserElementIdx(contCols))=false;
        end
        %and now we delete the relevant continous points from the toRemove
        %list
        toRemove=toRemove(contiguousToKeep | notContiguous);

        if any(diff(toRemove(:))==1) && doDisplay
            warning([num2str(sum(diff(toRemove(:))==1)) ' continous elements removed in one iteration.'])
        end
        toRemoveLogical=false(pCount,1);
        toRemoveLogical(toRemove)=true;

        X=X(~toRemoveLogical);
        Y=Y(~toRemoveLogical);
        indices=indices(~toRemoveLogical);

        pCount=length(X);
        nRemoved=sum(toRemoveLogical);
        if doDisplay
            disp(['Iteration ' num2str(iterCount) ', Point count = ' num2str(pCount) ' (' num2str(nRemoved) ' removed)'])
        end
        if nRemoved==0
            break;
        end
    end
end

function d = ppDistance(p1,p2)
    d=sqrt((p1(:,1)-p2(:,1)).^2+(p1(:,2)-p2(:,2)).^2);
end

关于绘制大型时间序列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48840461/

文章推荐： bootstrap-4 - 对 child 的CSS缩放变换不影响 parent 的大小

文章推荐： r - 是否可以为 blogdown "new post"插件创建 Rmd 文件模板？

文章推荐： gcc - gcc -fno-trapping-math 有什么作用？

c++ - 使用 SDL_Renderer 绘制 2D 内容，使用 SDL_GLContext 绘制 OpenGL 内容
我学习 SDL 二维编程已有一段时间了，现在我想创建一个结合使用 SDL 和 OpenGL 的程序。我是这样设置的: SDL_Init(SDL_INIT_VIDEO); window = SDL_Cr
绘制 map 投影类型
尝试查找可在地块中使用的不同类型项目的列表来自不同样本的投影类型: projection = list(type = "equirectangular") projection = list(typ
Java 绘制 GIF
我正在尝试使用 Java Graphics API 绘制 GIF，但无法使用下面的代码成功绘制 GIF。仅绘制 GIF 的第一张图像或缩略图，但不播放。 public void paintCompon
Java JFrame 绘制
我目前正在使用 JFrame 并尝试绘制一个矩形，但我不知道如何执行代码 paint(Graphics g)，如何获取 Graphics 对象？ package com.raggaer.frame;
java - 绘制 ImageView
这个领域的新手，希望得到一些帮助。我有一个"Missile.java" 类，我在那里画东西。我想绘制一个 ImageView，我正在使用以下代码: ImageView v = (ImageView)
HTML5 Canvas - 绘制
下面列出了圆形的例子这是我的 JavaScript 代码。最佳答案假设您的 randomColor 是正确的，您只需要: 从 canvas.onclick 中移除 context.clearR
Android在ImageView上缩放、拖动、绘制
我在绘制和缩放 ImageView 时遇到问题。请帮帮我.. 当我画一些东西然后拖动或缩放图像时 - 绘图保留在原处，如您在屏幕截图中所见。而且我只需要简单地在图片上绘图，并且可以缩放和拖动这张图片。
c# - 绘制/绘制外部形式
我们可以在形式之外绘制图像和文本...我的意思是在字面上... 我知道问这个问题很愚蠢但是我们能不能... 最佳答案您可以通过创建表单并将其 TransparentColor 属性设置为背景色来“作
java - 绘制/布局期间的对象分配？
我在绘制/布局期间收到 3 个对象分配警告 super.onDraw(canvas); canvas.drawColor(Color.WHITE); Paint textPaint = new Pai
python - 绘制 Pandas 时间序列数据框的线性回归线的置信区间
我有一个示例时间序列数据框: df = pd.DataFrame({'year':'1990','1991','1992','1993','1994','1995','1996',
r - 绘制 R 数据框中所有列的分布
我试图想出一种简洁的方法来绘制 R 数据框中所有列的 GridView 。问题是我的数据框中既有离散值又有数值。为简单起见，我们可以使用 R 提供的名为 iris 的示例数据集。我会使用 par(mf
r - 绘制 "list"的密度
我有一个由 10 列和 50 行组成的 data.frame。我使用 apply 函数逐列计算密度函数。现在我想绘制我一次计算的密度。换句话说，而不是绘图... plot(den[[1]]) plo
r - 绘制 PCA 的所有组件
我想知道我们如何才能在第一个和第二个组件之外绘制个人，如下所示: 最佳答案这可能有效: pc.cr <- princomp(USArrests, cor = TRUE) pairs(pc.cr$lo
pandas - 绘制 Pandas DataFrame时缺少xticklabels的第一个值
我是Pandas和matplotlib的新手，想绘制此DataFrame season won team matches pct_won 0 20
python - 绘制 Distplot 子图
我正在尝试为 distplot 子图编写一个 for 循环。我有一个包含许多不同长度列的数据框。 (不包括 NaN 值) fig = make_subplots( rows=len(asse
r - 绘制 3d 密度
我想创建一个具有密度的 3d 图。我使用函数 density 首先为特定的 x 值创建一个二维图，然后该函数创建密度并将它们放入 y 变量中。现在我有第二组 x 值并将其再次放入密度函数中，然后我得
python - 绘制 OpenStreetMap 关系不会生成连续线
全部，我一直在研究全局所有 MTB 步道的索引。我是 Python 人，所以对于所有涉及的步骤，我都尝试使用 Python 模块。我能够像这样从 OSM 立交桥 API 中获取关系: from O
r - 绘制 SVM 分类图时出错
我正在使用 e1071 包中的支持向量机对我的数据进行分类，并希望可视化机器实际如何进行分类。但是，在使用 plot.svm 函数时，出现无法解决的错误。脚本: library("e1071") d
r - 绘制 XTS 对象时的变化
我制作了以下图表，它是使用 xts 对象创建的。我使用的代码很简单 plot(graphTS1$CCLL, type = "l", las = 2, ylab = "(c)\nCC for I
uml - 绘制 UML 状态图
在绘制状态图时，您如何知道哪些状态放在框中，哪些状态用于转换箭头？我注意到转换也是状态。我正在查看 this page 上的图 1 : 最佳答案转换不是状态。转换是将对象从一种状态移动到下一种状态

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

绘制大型时间序列