python - 在 Python Numba/NumPy 中实现的分摊 O(1) 滚动最小值-6ren

python - 在 Python Numba/NumPy 中实现的分摊 O(1) 滚动最小值

转载作者：塔克拉玛干更新时间：2023-11-03 04:20:38

我正在尝试实现具有摊销 O(1) get_min() 的滚动最小值.摊销的 O(1) 算法来自 the accepted answer in this post

原始函数:

import pandas as pd
import numpy as np
from numba import njit, prange

def rolling_min_original(data, n):
    return pd.Series(data).rolling(n).min().to_numpy()

我尝试实现摊销 O(1) get_min()算法:(这个函数对非小n有不错的表现)

@njit
def rollin_min(data, n):
    """
        brief explanations:

        param: stk2: the stack2 in the algorithm, except here it only stores the min stack
        param: stk2_top: it starts at n-1, and drops gradually until it hits -1 then it comes backup to n-1
            if stk2_top= 0 in the current iteration(it will become -1 at the end):
                that means stk2_top is pointing at the bottom element in stk2, 
            after it drops to -1 from 0, in the next iteration, stk2 will  be reassigned to a new array data[i-n+1:i+1],
            because we need to include the current index. 

        at each iteration:
        if stk2_top <0: (i.e. we have 0 stuff in stk2(aka stk2_top <0)
            - copy the past n items(including the current one) to stk2, so that stk2 has n items now
            - pick the top min from stk2(stk2_top = n-1 momentarily)
            - move down the pointer by 1 after the operation(n-1 becomes n-2)

        else: (i.e. we have j(1<=j<= n-1) stuff in stk2)
            - pick the top min from stk2(stk2_top is j-1 momentarily)
            - move down the pointer by 1 after the operation(j-1 becomes j-2)

    """


    if n >1:  

        def min_getter_rev(arr1):
            arr = arr1[::-1]
            result = np.empty(len(arr), dtype = arr1.dtype)
            result[0]= local_min = arr[0]

            for i in range(1,len(arr)):
                if arr[i] < local_min:
                    local_min = arr[i]
                result[i] = local_min
            return result

        result_min= np.empty(len(data), dtype= data.dtype)
        for i in prange(n-1):
            result_min[i] =np.nan


        stk2 = min_getter_rev(data[:n])
        stk2_top = n-2#it is n-2 because the loop starts at n(not n-1)which is the second non nan term
        stk1_min = data[n-1]#stk1 needs to be the first item of the stk1
        result_min[n-1]= min(stk1_min, stk2[-1])    

        for i in range(n,len(data)):
            if stk2_top >= 0:
                if data[i] < stk1_min:
                    stk1_min= min(data[i], stk1_min)#the stk1 min
                result_min[i] = min(stk1_min, stk2[stk2_top])#min of the top element in stk2 and the current element

            else:
                stk2 = min_getter_rev(data[i-n+1:i+1])
                stk2_top= n-1
                stk1_min = data[i]
                result_min[i]= min(stk1_min, stk2[n-1])

            stk2_top -= 1

        return result_min   
    else:
        return data

n 时的天真实现很小:

@njit(parallel= True)
def rolling_min_smalln(data, n):
    result= np.empty(len(data), dtype= data.dtype)

    for i in prange(n-1):
        result[i]= np.nan

    for i in prange(n-1, len(data)):
        result[i]= data[i-n+1: i+1].min()

    return result

一些用于测试的小代码

def remove_nan(arr):
    return arr[~np.isnan(arr)]

if __name__ == '__main__':

    np.random.seed(0)
    data_size = 200000
    data = np.random.uniform(0,1000, size = data_size)+29000

    w_size = 37

    r_min_original= rolling_min_original(data, w_size)
    rmin1 = rollin_min(data, w_size)

    r_min_original = remove_nan(r_min_original)
    rmin1 = remove_nan(rmin1)

    print(np.array_equal(r_min_original,rmin1))

函数rollin_min()具有几乎恒定的运行时间和比 rolling_min_original() 更低的运行时间什么时候n很大，很好。但是在n时表现不佳很低(在我的电脑中大约为 n < 37，在此范围内 rollin_min() 很容易被天真的实现击败 rolling_min_smalln() )。

我正在努力寻找改进的方法rollin_min() ，但到目前为止我被卡住了，这就是我在这里寻求帮助的原因。

我的问题如下:

我正在实现的算法是滚动/滑动窗口最小/最大的最佳算法吗？

如果不是，最好/更好的算法是什么？如果可以，如何从算法的角度进一步改进功能？

除了算法本身，还有哪些方法可以进一步提升函数的性能rollin_min() ？

编辑:根据多个请求将我的最新答案移至答案部分

最佳答案

代码运行缓慢的主要原因可能是在 min_getter_rev 中分配了一个新数组。您应该在整个过程中重复使用相同的存储空间。

然后，因为你真的不必实现一个队列，你可以做更多的优化。例如，两个堆栈的大小最多(通常)为 n，因此您可以将它们保存在大小为 n 的同一个数组中。从头开始种植一个，从最后种植一个。

您会注意到有一个非常规则的模式 - 按顺序从头到尾填充数组，从末尾重新计算最小值，在重新填充数组时生成输出，重复...

这导致了一个实际上更简单的算法，其解释更简单，根本不涉及堆栈。这是一个实现，其中包含有关其工作原理的评论。请注意，我没有费心用 NaN 填充开头:

def rollin_min(data, n):

    #allocate the result.  Note the number valid windows is len(data)-(n-1)
    result = np.empty(len(data)-(n-1), data.dtype)

    #every nth position is a "mark"
    #every window therefore contains exactly 1 mark
    #the minimum in the window is the minimum of:
    #  the minimum from the window start to the following mark; and
    #  the minimum from the window end the the preceding (same) mark

    #calculate the minimum from every window start index to the next mark
    for mark in range(n-1, len(data), n):
        v = data[mark]
        if (mark < len(result)):
            result[mark] = v
        for i in range(mark-1, mark-n, -1):
            v = min(data[i],v)
            if (i < len(result)):
                result[i] = v

    #for each window, calculate the running total from the preceding mark
    # to its end.  The first window ends at the first mark
    #then combine it with the first distance to get the window minimum

    nextMarkPos = 0
    for i in range(0,len(result)):
        if i == nextMarkPos:
             v = data[i+n-1]
             nextMarkPos += n
        else:
            v = min(data[i+n-1],v)
        result[i] = min(result[i],v)

    return result

关于python - 在 Python Numba/NumPy 中实现的分摊 O(1) 滚动最小值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58046739/

文章推荐： java - 让 log4j2 与 eclipse 一起工作

文章推荐： c# - 随机平滑方向变化

文章推荐： java - 命令模式如何被 lambda 表达式取代？

文章推荐： arrays - 在 O(log(n)) 时间内找到数组中缺失的数字

java - 自定义 JPA 实现//现有的无 SQL JPA 实现
背景: 我最近一直在使用 JPA，我为相当大的关系数据库项目生成持久层的轻松程度给我留下了深刻的印象。我们公司使用大量非 SQL 数据库，特别是面向列的数据库。我对可能对这些数据库使用 JPA 有一
java - 未由 S3FileSystem FileSystem 实现 Hadoop Jar 实现
我已经在我的 maven pom 中添加了这些构建配置，因为我希望将 Apache Solr 依赖项与 Jar 捆绑在一起。否则我得到了 SolarServerException: ClassNotF
c# - 实现 "Inherit"(实现)通用接口(interface)的接口(interface)？
interface ITurtle { void Fight(); void EatPizza(); } interface ILeonardo : ITurtle {
java - 任何 JPA 实现(或更广泛的 Java ORM 实现)是否支持可更新游标
我希望可用于 Java 的对象/关系映射 (ORM) 工具之一能够满足这些要求: 使用 JPA 或 native SQL 查询获取大量行并将其作为实体对象返回。允许在行(实体)中进行迭代，并在对当前
generics - 如果我为 B 实现 From ，是否也会为 Vec 实现 From>？
好像没有，因为我有实现From for 的代码, 我可以转换 A到 B与 .into() , 但同样的事情不适用于 Vec .into()一个Vec . 要么我搞砸了阻止实现派生的事情，要么这不应该发

c# - 在 C# 中，如果 A 实现 IX 并且 B 继承自 A ，是否必然遵循 B 实现 IX？
在 C# 中，如果 A 实现 IX 并且 B 继承自 A ，是否必然遵循 B 实现 IX？如果是，是因为 LSP 吗？之间有什么区别吗: 1. Interface IX; Class A : IX;

OpenVG 实现？
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用资料或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the

performance - 实现 (^)
我正在阅读标准haskell库的(^)的实现代码: (^) :: (Num a, Integral b) => a -> b -> a x0 ^ y0 | y0 a -> b ->a expo x0

博弈树的C++实现
我将把国际象棋游戏表示为 C++ 结构。我认为，最好的选择是树结构(因为在每个深度我们都有几个可能的移动)。这是一个好的方法吗？ struct TreeElement{ SomeMoveType

字符串匹配alg的c++实现
我正在为用户名数据库实现字符串匹配算法。我的方法采用现有的用户名数据库和用户想要的新用户名，然后检查用户名是否已被占用。如果采用该方法，则该方法应该返回带有数据库中未采用的数字的用户名。例子: “贾

图算法的C++实现
我正在尝试实现 Breadth-first search algorithm , 为了找到两个顶点之间的最短距离。我开发了一个 Queue 对象来保存和检索对象，并且我有一个二维数组来保存两个给定顶点

Python A* 实现
我目前正在 ika 中开发我的 Python 游戏，它使用 python 2.5 我决定为 AI 使用 A* 寻路。然而，我发现它对我的需要来说太慢了(3-4 个敌人可能会落后于游戏，但我想供应 4-

DHT的C++实现
我正在寻找 Kademlia 的开源实现C/C++ 中的分布式哈希表。它必须是轻量级和跨平台的(win/linux/mac)。它必须能够将信息发布到 DHT 并检索它。最佳答案 OpenDHT是

C++实现
我在一本书中读到这一行:-“当我们要求 C++ 实现运行程序时，它会通过调用此函数来实现。” 而且我想知道“C++ 实现”是什么意思或具体是什么。帮忙!？最佳答案 “C++ 实现”是指编译器加上链接

背包分支定界的C++实现
我正在尝试使用分支定界的 C++ 实现这个背包问题。此网站上有一个 Java 版本:Implementing branch and bound for knapsack 我试图让我的 C++ 版本打印

FNV哈希的C#实现
在很多情况下，我需要在 C# 中访问合适的哈希算法，从重写 GetHashCode 到对数据执行快速比较/查找。我发现 FNV 哈希是一种非常简单/好/快速的哈希算法。但是，我从未见过 C# 实现的

LRU缓存替换策略及C#实现
目录 LRU缓存替换策略核心思想不适用场景算法基本实现算法优化

大角度非迭代的空间坐标旋转C#实现
1. 绪论在前面文章中提到空间直角坐标系相互转换，测绘坐标转换时，一般涉及到的情况是：两个直角坐标系的小角度转换。这个就是我们经常在测绘数据处理中，WGS-84坐标系、54北京坐标系

实现.Net7下的数据库定时检查
在软件开发过程中，有时候我们需要定时地检查数据库中的数据，并在发现新增数据时触发一个动作。为了实现这个需求，我们在 .Net 7 下进行一次简单的演示. PeriodicTimer .

查找算法之二分查找的C++实现
二分查找二分查找算法，说白了就是在有序的数组里面给予一个存在数组里面的值key，然后将其先和数组中间的比较，如果key大于中间值，进行下一次mid后面的比较，直到找到相等的，就可以得到它的位置。

塔克拉玛干

个人简介
我是一名优秀的程序员,十分优秀！

作者热门文章

iOS/Objective-C 元类和类别

objective-c - -1001 错误，当 NSURLSession 通过 httpproxy 和/etc/hosts

java - 使用网络类获取 url 地址

ios - 推送通知中不播放声音

滴滴打车优惠券免费领取

全站热门文章

还在为慢速数据传输苦恼？Linux零拷贝技术来帮你！

LLM应用实战:AI资讯的自动聚合及报告生成

.NET8高性能跨平台图像处理库ImageSharp

缓存穿透防护方案设计

一个.NET开源、轻量级的运行耗时统计库-MethodTimer

AOT使用经验总结

让性能提升56%的Vue3.5响应式重构之“版本计数”

ATC：多快好省，无参数tokenreduction方法|ECCV'24

manim边学边做--三维的点和线

如何避免HttpClient丢失请求头：通过HttpRequestMessage解决并优化

首页

博学

6Ren·AI

商城

python - 在 Python Numba/NumPy 中实现的分摊 O(1) 滚动最小值