gpt4 book ai didi

绘制大型时间序列

转载 作者:行者123 更新时间:2023-12-04 19:28:26 24 4
gpt4 key购买 nike

问题摘要 :

是否有任何易于实现的算法来减少表示时间序列所需的点数而不改变它在图中的显示方式?

激励问题 :

我正在尝试以交互方式可视化从嵌入式系统以 ~20 kHz 记录的 10 到 15 个 channel 的数据。日志可以覆盖超过一个小时的时间,这意味着我正在处理 1e8 和 1e9 点之间。此外,我关心持续很短时间(即小于 1 毫秒)的潜在小异常,因此简单的抽取不是一种选择。

毫不奇怪,如果您做一些幼稚的事情并尝试将比专用 GPU 内存更大的数据数组交给它们,大多数绘图库会感到有点难过。在我的系统上,它实际上比这更糟;使用随机浮点数向量作为测试用例,在刷新率低于 1 FPS 之前,我只从库存的 Matlab 绘图函数和 Python + matplotlib 中获得了大约 5e7 点。

现有问题和解决方案:

这个问题有点类似于一些现有的问题,例如:

  • How to plot large data vectors accurately at all zoom levels in real time?
  • How to plot large time series (thousands of administration times/doses of a medication)?
  • [几个交叉验证的问题]

  • 但处理更大的数据集和/或以交互性为代价对保真度更加严格(获得 60 FPS 柔滑平滑的平移和缩放会很棒,但实际上,我会对 1 FPS 感到满意)。

    显然,需要某种形式的数据缩减。在搜索解决我的问题的现有工具时,我发现了两种范式:
  • 抽取但跟踪异常值:一个很好的例子是 Matlab + dsplot (即我上面链接的第一个问题的公认答案中建议的工具)。 dsplot 减少到固定数量的均匀间隔的点,然后添加回使用高通 FIR 滤波器的标准偏差识别的异常值。虽然这对于几类数据来说可能是一个可行的解决方案,但如果存在超过滤波器截止频率的大量频率内容并且可能需要调谐,则它可能会遇到困难。
  • 绘制最小值和最大值:使用这种方法,您将时间序列划分为对应于每个水平像素的间隔,并仅绘制每个间隔中的最小值和最大值。 Matlab + Plot (Big)是一个很好的例子,但使用 O(n) 的 min 和 max 计算使它在到达 1e8 或 1e9 点时有点慢。 mex 函数或 python 中的二叉搜索树可以解决这个问题,但实现起来很复杂。

  • 有没有更简单的解决方案可以满足我的要求?

    编辑 (2018-02-18):问题重构为专注于算法而不是实现算法的工具。

    最佳答案

    我在显示数百个传感器的压力时间序列时遇到了同样的问题,几年来每分钟都有样本。在某些情况下(例如清理数据时),我想查看所有异常值,而在其他情况下,我对趋势更感兴趣。所以我写了一个函数,可以使用两种方法减少数据点的数量:visvalingam 和 Douglas-Peucker。第一个倾向于删除异常值,第二个保留它们。我已经优化了该函数以处理大型数据集。
    在意识到所有绘图方法都无法处理那么多点之后,我这样做了,而那些能够处理的方法正在以我无法控制的方式抽取数据集。函数如下:

    function [X, Y, indices, relevance] = lineSimplificationI(X,Y,N,method,option)
    %lineSimplification Reduce the number of points of the line described by X
    %and Y to N. Preserving the most relevant ones.
    % Using an adapted method of visvalingam and Douglas-Peucker algorithms.
    % The number of points of the line is reduced iteratively until reaching
    % N non-NaN points. Repeated NaN points in original data are deleted but
    % non-repeated NaNs are preserved to keep line breaks.
    % The two available methods are
    %
    % Visvalingam: The relevance of a point is proportional to the area of
    % the triangle defined by the point and its two neighbors.
    %
    % Douglas-Peucker: The relevance of a point is proportional to the
    % distance between it and the straight line defined by its two neighbors.
    % Note that the implementation here is iterative but NOT recursive as in
    % the original algorithm. This allows to better handle large data sets.
    %
    % DIFFERENCES: Visvalingam tend to remove outliers while Douglas-Peucker
    % keeps them.
    %
    % INPUTS:
    % X: X coordinates of the line points
    % Y: Y coordinates of the line points
    % method: Either 'Visvalingam' or 'DouglasPeucker' (default)
    % option: Either 'silent' (default) or 'verbose' if additional outputs
    % of the calculations are desired.
    %
    % OUTPUTS:
    % X: X coordinates of the simplified line points
    % Y: Y coordinates of the simplified line points
    % indices: Indices to the positions of the points preserved in the
    % original X and Y. Therefore Output X is equal to the input
    % X(indices).
    % relevance: Relevance of the returned points. It can be used to furder
    % simplify the line dinamically by keeping only points with
    % higher relevance. But this will produce bigger distortions of
    % the line shape than calling again lineSimplification with a
    % smaller value for N, as removing a point changes the relevance
    % of its neighbors.
    %
    % Implementation by Camilo Rada - camilo@rada.cl
    %

    if nargin < 3
    error('Line points positions X, Y and target point count N MUST be specified');
    end
    if nargin < 4
    method='DouglasPeucker';
    end
    if nargin < 5
    option='silent';
    end

    doDisplay=strcmp(option,'verbose');

    X=double(X(:));
    Y=double(Y(:));
    indices=1:length(Y);

    if length(X)~=length(Y)
    error('Vectors X and Y MUST have the same number of elements');
    end

    if N>=length(Y)
    relevance=ones(length(Y),1);
    if doDisplay
    disp('N is greater or equal than the number of points in the line. Original X,Y were returned. Relevances were not computed.')
    end
    return
    end
    % Removing repeated NaN from Y
    % We find all the NaNs with another NaN to the left
    repeatedNaNs= isnan(Y(2:end)) & isnan(Y(1:end-1));
    %We also consider a repeated NaN the first element if NaN
    repeatedNaNs=[isnan(Y(1)); repeatedNaNs(:)];
    Y=Y(~repeatedNaNs);
    X=X(~repeatedNaNs);
    indices=indices(~repeatedNaNs);

    %Removing trailing NaN if any
    if isnan(Y(end))
    Y=Y(1:end-1);
    X=X(1:end-1);
    indices=indices(1:end-1);
    end

    pCount=length(X);

    if doDisplay
    disp(['Initial point count = ' num2str(pCount)])
    disp(['Non repeated NaN count in data = ' num2str(sum(isnan(Y)))])
    end

    iterCount=0;

    while pCount>N
    iterCount=iterCount+1;
    % If the vertices of a triangle are at the points (x1,y1) , (x2, y2) and
    % (x3,y3) the are uf such triangle is
    % area = abs((x1*(y2-y3)+x2*(y3-y1)+x3*(y1-y2))/2)
    % now the areas of the triangles defined by each point of X,Y and its two
    % neighbors are

    twiceTriangleArea =abs((X(1:end-2).*(Y(2:end-1)-Y(3:end))+X(2:end-1).*(Y(3:end)-Y(1:end-2))+X(3:end).*(Y(1:end-2)-Y(2:end-1))));

    switch method
    case 'Visvalingam'
    % In this case the relevance is given by the area of the
    % triangle formed by each point end the two points besides
    relevance=twiceTriangleArea/2;
    case 'DouglasPeucker'
    % In this case the relevance is given by the minimum distance
    % from the point to the line formed by its two neighbors
    neighborDistances=ppDistance([X(1:end-2) Y(1:end-2)],[X(3:end) Y(3:end)]);
    relevance=twiceTriangleArea./neighborDistances;
    otherwise
    error(['Unknown method: ' method]);
    end
    relevance=[Inf; relevance; Inf];
    %We remove the pCount-N least relevant points as long as they are not contiguous

    [srelevance, sortorder]= sort(relevance,'descend');
    firstFinite=find(isfinite(srelevance),1,'first');
    startPos=uint32(firstFinite+N+1);
    toRemove=sort(sortorder(startPos:end));
    if isempty(toRemove)
    break;
    end

    %Now we have to deal with contigous elements, as removing one will
    %change the relevance of the neighbors. Therefore we have to
    %identify pairs of contigous points and only remove the one with
    %leeser relevance

    %Contigous will be true for an element if the next or the previous
    %element is also flagged for removal
    contiguousToKeep=[diff(toRemove(:))==1; false] | [false; (toRemove(1:end-1)-toRemove(2:end))==-1];
    notContiguous=~contiguousToKeep;

    %And the relevances asoociated to the elements flagged for removal
    contRel=relevance(toRemove);

    % Now we rearrange contigous so it is sorted in two rows, therefore
    % if both rows are true in a given column, we have a case of two
    % contigous points that are both flagged for removal
    % this process is demenden of the rearrangement, as contigous
    % elements can end up in different colums, so it has to be done
    % twice to make sure no contigous elements are removed
    nContiguous=length(contiguousToKeep);

    for paddingMode=1:2
    %The rearragngement is only possible if we have an even number of
    %elements, so we add one dummy zero at the end if needed
    if paddingMode==1
    if mod(nContiguous,2)
    pcontiguous=[contiguousToKeep; false];
    pcontRel=[contRel; -Inf];
    else
    pcontiguous=contiguousToKeep;
    pcontRel=contRel;
    end
    else
    if mod(nContiguous,2)
    pcontiguous=[false; contiguousToKeep];
    pcontRel=[-Inf; contRel];
    else
    pcontiguous=[false; contiguousToKeep(1:end-1)];
    pcontRel=[-Inf; contRel(1:end-1)];
    end
    end

    contiguousPairs=reshape(pcontiguous,2,[]);
    pcontRel=reshape(pcontRel,2,[]);

    %finding colums with contigous element
    contCols=all(contiguousPairs);
    if ~any(contCols) && paddingMode==2
    break;
    end
    %finding the row of the least relevant element of each column
    [~, lesserElementRow]=max(pcontRel);

    %The index in contigous of the first element of each pair is
    if paddingMode==1
    firstElementIdx=((1:size(contiguousPairs,2))*2)-1;
    else
    firstElementIdx=((1:size(contiguousPairs,2))*2)-2;
    end

    % and the index in contigous of the most relevant element of each
    % pair is
    lesserElementIdx=firstElementIdx+lesserElementRow-1;

    %now we set the least relevant element as NOT continous, so it is
    %removed
    contiguousToKeep(lesserElementIdx(contCols))=false;
    end
    %and now we delete the relevant continous points from the toRemove
    %list
    toRemove=toRemove(contiguousToKeep | notContiguous);

    if any(diff(toRemove(:))==1) && doDisplay
    warning([num2str(sum(diff(toRemove(:))==1)) ' continous elements removed in one iteration.'])
    end
    toRemoveLogical=false(pCount,1);
    toRemoveLogical(toRemove)=true;

    X=X(~toRemoveLogical);
    Y=Y(~toRemoveLogical);
    indices=indices(~toRemoveLogical);

    pCount=length(X);
    nRemoved=sum(toRemoveLogical);
    if doDisplay
    disp(['Iteration ' num2str(iterCount) ', Point count = ' num2str(pCount) ' (' num2str(nRemoved) ' removed)'])
    end
    if nRemoved==0
    break;
    end
    end
    end

    function d = ppDistance(p1,p2)
    d=sqrt((p1(:,1)-p2(:,1)).^2+(p1(:,2)-p2(:,2)).^2);
    end

    关于绘制大型时间序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48840461/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com