python - 这个递归函数能否变成具有类似性能的迭代函数？-6ren

python - 这个递归函数能否变成具有类似性能的迭代函数？

转载作者：行者123 更新时间：2023-12-03 07:51:32

我正在使用 numba 在 python 中编写一个函数来标记 2D 或 3D 数组中的对象，这意味着输入数组中具有相同值的所有正交连接单元将在输出数组中获得从 1 到 N 的唯一标签，其中 N 是正交连接组的数量。它与 scipy.ndimage.label 等功能非常相似。以及 scikit-image 等库中的类似函数，但这些函数标记所有正交连接的非零单元格组，因此它将合并具有不同值的连接组，这是我不想要的。例如，给定以下输入:

[0 0 7 7 0 0
 0 0 7 0 0 0
 0 0 0 0 0 7
 0 6 6 0 0 7
 0 0 4 4 0 0]

scipy 函数将返回

[0 0 1 1 0 0
 0 0 1 0 0 0
 0 0 0 0 0 3
 0 2 2 0 0 3
 0 0 2 2 0 0]

请注意，6 和 4 已合并到标签 2 中。我希望将它们标记为单独的组，例如:

[0 0 1 1 0 0
 0 0 1 0 0 0
 0 0 0 0 0 4
 0 2 2 0 0 4
 0 0 3 3 0 0]

我asked this about a year ago并一直在接受的答案中使用该解决方案，但是我正在努力优化代码的运行时并重新审视这个问题。

对于我通常使用的数据大小，链接的解决方案需要大约 1 分 30 秒才能运行。我编写了以下递归算法，该算法作为常规 python 运行大约需要 30 秒，而 numba 的 JIT 在 1-2 秒内运行(旁注，我讨厌那个相邻的函数，任何让它不那么困惑同时仍然与 numba 兼容的提示将不胜感激):

@numba.njit
def adjacent(idx, shape):
    coords = []
    if len(shape) > 2:
        if idx[0] < shape[0] - 1:
            coords.append((idx[0] + 1, idx[1], idx[2]))
        if idx[0] > 0:
            coords.append((idx[0] - 1, idx[1], idx[2]))
        if idx[1] < shape[1] - 1:
            coords.append((idx[0], idx[1] + 1, idx[2]))
        if idx[1] > 0:
            coords.append((idx[0], idx[1] - 1, idx[2]))
        if idx[2] < shape[2] - 1:
            coords.append((idx[0], idx[1], idx[2] + 1))
        if idx[2] > 0:
            coords.append((idx[0], idx[1], idx[2] - 1))
    else:
        if idx[0] < shape[0] - 1:
            coords.append((idx[0] + 1, idx[1]))
        if idx[0] > 0:
            coords.append((idx[0] - 1, idx[1]))
        if idx[1] < shape[1] - 1:
            coords.append((idx[0], idx[1] + 1))
        if idx[1] > 0:
            coords.append((idx[0], idx[1] - 1))
    return coords


@numba.njit
def apply_label(labels, decoded_image, current_label, idx):
    labels[idx] = current_label
    for aidx in adjacent(idx, labels.shape):
        if decoded_image[aidx] == decoded_image[idx] and labels[aidx] == 0:
            apply_label(labels, decoded_image, current_label, aidx)


@numba.njit
def label_image(decoded_image):
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    for idx in zip(*np.where(decoded_image >= 0)):
        if labels[idx] == 0:
            current_label += 1
            apply_label(labels, decoded_image, current_label, idx)
    return labels, current_label

这适用于某些数据，但在其他数据上崩溃，我发现问题是当有非常大的对象要标记时，就会达到递归限制。我尝试重写 label_image 以不使用递归，但现在使用 numba 需要大约 10 秒。与我开始的地方相比仍然有很大的改进，但似乎应该可以获得与递归版本相同的性能。这是我的迭代版本:

@numba.njit
def label_image(decoded_image):
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    for idx in zip(*np.where(decoded_image >= 0)):
        if labels[idx] == 0:
            current_label += 1
            idxs = [idx]
            while idxs:
                cidx = idxs.pop()
                if labels[cidx] == 0:
                    labels[cidx] = current_label
                    for aidx in adjacent(cidx, labels.shape):
                        if labels[aidx] == 0 and decoded_image[aidx] == decoded_image[idx]:
                            idxs.append(aidx)
    return labels, current_label

有什么办法可以改善这个问题吗？

最佳答案

Can this recursive function be turned into an iterative function with similar performance?

将其转换为迭代函数很简单，考虑到它只是一个简单的深度优先搜索(您也可以使用队列而不是堆栈来使用广度优先搜索，两者都可以)。只需使用堆栈来跟踪要访问的节点即可。这是适用于任意数量维度的通用解决方案:

def label_image(decoded_image):
    shape = decoded_image.shape
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    for idx in zip(*np.where(decoded_image > 0)):
        if labels[idx] == 0:
            current_label += 1
            stack = [idx]
            while stack:
                top = stack.pop()
                labels[top] = current_label
                for i in range(0, len(shape)):
                    if top[i] > 0:
                        neighbor = list(top)
                        neighbor[i] -= 1
                        neighbor = tuple(neighbor)
                        if decoded_image[neighbor] == decoded_image[idx] and labels[neighbor] == 0:
                            stack.append(neighbor)
                    if top[i] < shape[i] - 1:
                        neighbor = list(top)
                        neighbor[i] += 1
                        neighbor = tuple(neighbor)
                        if decoded_image[neighbor] == decoded_image[idx] and labels[neighbor] == 0:
                            stack.append(neighbor)
    return labels

尽管从元组的第 i 个分量中添加或减去一个很尴尬(我将在此处查看临时列表)，并且 numba 不接受它(类型错误)。一种简单的解决方案是显式编写 2d 和 3d 版本，这可能会极大地提高性能:

@numba.njit
def label_image_2d(decoded_image):
    w, h = decoded_image.shape
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    for idx in zip(*np.where(decoded_image > 0)):
        if labels[idx] == 0:
            current_label += 1
            stack = [idx]
            while stack:
                x, y = stack.pop()
                if decoded_image[x, y] != decoded_image[idx] or labels[x, y] != 0:
                    continue # already visited or not part of this group
                labels[x, y] = current_label
                if x > 0: stack.append((x-1, y))
                if x+1 < w: stack.append((x+1, y))
                if y > 0: stack.append((x, y-1))
                if y+1 < h: stack.append((x, y+1))
    return labels

@numba.njit
def label_image_3d(decoded_image):
    w, h, l = decoded_image.shape
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    for idx in zip(*np.where(decoded_image > 0)):
        if labels[idx] == 0:
            current_label += 1
            stack = [idx]
            while stack:
                x, y, z = stack.pop()
                if decoded_image[x, y, z] != decoded_image[idx] or labels[x, y, z] != 0:
                    continue # already visited or not part of this group
                labels[x, y, z] = current_label
                if x > 0: stack.append((x-1, y, z))
                if x+1 < w: stack.append((x+1, y, z))
                if y > 0: stack.append((x, y-1, z))
                if y+1 < h: stack.append((x, y+1, z))
                if z > 0: stack.append((x, y, z-1))
                if z+1 < l: stack.append((x, y, z+1))
    return labels

def label_image(decoded_image):
    dim = len(decoded_image.shape)
    if dim == 2:
        return label_image_2d(decoded_image)
    assert dim == 3
    return label_image_3d(decoded_image)

另请注意，迭代解决方案不受堆栈限制的影响:np.full((100,100,100), 1) 在迭代解决方案中工作得很好，但在递归解决方案中失败(段错误)如果使用 numba)。

做一个非常基本的基准测试

for i in range(1, 10000):
    label_image(np.full((20,20,20), i))

(多次迭代以尽量减少 JIT 的影响，也可以进行一些预热运行，然后开始测量时间或类似的操作)

迭代解决方案似乎快了几倍(~~在我的机器上大约是 5 倍~~，见下文)。您可能可以优化递归解决方案并使其达到可比较的速度，例如通过避免临时 coords 列表或将 np.where 更改为 > 0。

我不知道 numba 可以如何优化压缩的 np.where。为了进一步优化，您可以考虑(和基准测试)使用显式嵌套的 for x in range(0, w): for y in range(0, h): 循环。

为了与 Nick 提出的合并策略保持竞争力，我进一步优化了这一策略，挑选了一些容易实现的目标:

使用 continue 而不是 np.where 将 zip 转换为显式循环。
将 decoded_image[idx] 存储在局部变量中(理想情况下应该不重要，但不会造成伤害)。
重用堆栈。这可以防止不必要的(重新)分配和 GC 压力。还可以考虑为堆栈提供初始容量(分别为 w*h 或 w*h*l)。

@numba.njit
def label_image_2d(decoded_image):
    w, h = decoded_image.shape
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    stack = []
    for sx in range(0, w):
        for sy in range(0, h):
            start = (sx, sy)
            image_label = decoded_image[start]
            if image_label <= 0 or labels[start] != 0:
                continue
            current_label += 1
            stack.append(start)
            while stack:
                x, y = stack.pop()
                if decoded_image[x, y] != image_label or labels[x, y] != 0:
                    continue # already visited or not part of this group
                labels[x, y] = current_label
                if x > 0: stack.append((x-1, y))
                if x+1 < w: stack.append((x+1, y))
                if y > 0: stack.append((x, y-1))
                if y+1 < h: stack.append((x, y+1))
    return labels

@numba.njit
def label_image_3d(decoded_image):
    w, h, l = decoded_image.shape
    labels = np.zeros_like(decoded_image, dtype=np.uint32)
    current_label = 0
    stack = []
    for sx in range(0, w):
        for sy in range(0, h):
            for sz in range(0, l):
                start = (sx, sy, sz)
                image_label = decoded_image[start]
                if image_label <= 0 or labels[start] != 0:
                    continue
                current_label += 1
                stack.append(start)
                while stack:
                    x, y, z = stack.pop()
                    if decoded_image[x, y, z] != image_label or labels[x, y, z] != 0:
                        continue # already visited or not part of this group
                    labels[x, y, z] = current_label
                    if x > 0: stack.append((x-1, y, z))
                    if x+1 < w: stack.append((x+1, y, z))
                    if y > 0: stack.append((x, y-1, z))
                    if y+1 < h: stack.append((x, y+1, z))
                    if z > 0: stack.append((x, y, z-1))
                    if z+1 < l: stack.append((x, y, z+1))
    return labels

然后，我拼凑了一个基准来比较四种方法(原始递归、旧迭代、新迭代、基于合并)，并将它们放入四个不同的模块中:

import numpy as np
import timeit

import rec
import iter_old
import iter_new
import merge

shape = (100, 100, 100)
n = 20
for module in [rec, iter_old, iter_new, merge]:
    print(module)

    label_image = module.label_image
    # Trigger compilation of 2d & 3d functions
    label_image(np.zeros((1, 1)))
    label_image(np.zeros((1, 1, 1)))

    i = 0
    def test_full():
        global i
        i += 1
        label_image(np.full(shape, i))
    print("single group:", timeit.timeit(test_full, number=n))
    print("random (few groups):", timeit.timeit(
        lambda: label_image(np.random.randint(low = 1, high = 10, size = shape)),
        number=n))
    print("random (many groups):", timeit.timeit(
        lambda: label_image(np.random.randint(low = 1, high = 400, size = shape)),
        number=n))
    print("only groups:", timeit.timeit(
        lambda: label_image(np.arange(np.prod(shape)).reshape(shape)),
        number=n))

这会输出类似的内容

<module 'rec' from '...'>
single group: 32.39212468900041
random (few groups): 14.648884047001047
random (many groups): 13.304533919001187
only groups: 13.513677138000276
<module 'iter_old' from '...'>
single group: 10.287227957000141
random (few groups): 17.37535468200076
random (many groups): 14.506630064999626
only groups: 13.132202609998785
<module 'iter_new' from '...'>
single group: 7.388022166000155
random (few groups): 11.585243002000425
random (many groups): 9.560101995000878
only groups: 8.693653742000606
<module 'merge' from '...'>
single group: 14.657021331999204
random (few groups): 14.146574055999736
random (many groups): 13.412314713001251
only groups: 12.642367746000673

在我看来，改进的迭代方法可能会更好。请注意，最初的基本基准似乎是递归变体的最坏情况。一般来说差异不会那么大。

测试的阵列非常小(20立方)。如果我使用较大的数组 (100³) 和较小的 n (20) 进行测试，我大致会得到以下结果(省略了 rec，因为由于堆栈限制，它会出现段错误):

<module 'iter_old' from '...'>
single group: 3.5357716739999887
random (few groups): 4.931695729999774
random (many groups): 3.4671142009992764
only groups: 3.3023930709987326
<module 'iter_new' from '...'>
single group: 2.45903080700009
random (few groups): 2.907660342001691
random (many groups): 2.309699692999857
only groups: 2.052835552000033
<module 'merge' from '...'>
single group: 3.7620838259990705
random (few groups): 3.3524249689999124
random (many groups): 3.126650959999097
only groups: 2.9456547739991947

迭代方法似乎仍然更有效。

关于python - 这个递归函数能否变成具有类似性能的迭代函数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76956654/

文章推荐： java - 互斥的java方法执行(不是全部)

文章推荐： dbus - 忽略 Vala 中的 DBUS 方法

文章推荐： python - 如何解决此功能，使其对程序中的所有选项都有效？

typescript - 为什么元组推理从 any[] 变成 [string, string, ... ] 而从 string[] 变成 ['literal1' , 'literal2' , ... ]
我已经阅读了这个答案https://stackoverflow.com/a/45486495/1108891 ，它演示了元组类型推断。经过一些实验，我遇到了这种情况。当我们的 tuple 函数有 T
javascript - 如果放入 100 显示从 1 到 100 的每个数字，没有零，我想删除零 ex 20 变成 2 和 100 变成 1
我想删除零，我喜欢这个模型，如果我输入 1000 然后额外的表显示从 1 到 1000 的所有数字，每个数字都会检查并删除零。示例:如果我输入 10然后输出如 1 2 3 .....8 9 1(1
powershell - 在日文机器上发出运行powershell脚本，\变成¥
鉴于我对PowerShell的了解仍在开发中，请与我一起提出任何建议/答案。因此，在我工作的地方我们工作的公司拥有大量日文机器，需要注册Intune。但是，我们正在运行的脚本无法在其计算机上运行，
java - 如何将\\变成\
我刚刚制作了一个将路径保存到 INI 文件中的小程序。但是在输出中，路径是这样写的: C:\\Windows 我想这样写: C:\Windows 我用 string.replace 尝试了很多方法，
C++如何将字符串中的所有 `\`变成 `/`？
所以我尝试 std::replace(diff_path.begin(), diff_path.end(), "\\", "/"); 但它似乎无法在我的 Visual Studio 上编译.怎么办 -
javascript - 符号 (&) 变成 &
我使用以下代码每 30 秒自动抓取/设置最新的页面标题: setInterval(function() { var data = "http://mysite.com/mypage
iphone - UITableView 变成 UIScrollView？
我希望有两个 View 是组成集的一部分。每个 View 中的数据最好在 UITableView 中表示。然后，我想添加一个手势来使 View 在屏幕上闪烁，并引入另一个类似的 View ，并带有页面
javascript - "This"变成 "window"JavaScript
我正在尝试开发一个小游戏，但我遇到了以下问题:我有一个伪类“Cannon”，每个 Cannon 都有一个存储它应该守卫的区域的数组和一个存储“入侵者”的数组进入其中一个戒备区。我创建了下一个函数作为
mysql insert 将★变成★...
当我从应用程序中进行插入时，所有 ★(星号)都会变成“â…” 如何阻止这种情况发生？ *如果我直接通过 phpmyadmin 插入它，它就可以工作，但使用这个 php 时则不行: connect_er
ios - NSDictionary 变成 NSCFString
我遇到了一个奇怪的问题，将 NSDictionary 存储到 NSUserDefaults，然后检索它会将其转换为 NSCFString。这是我保存数据的地方: - (void)saveProgre
php file_get_contents($url) & 变成 &
我正在尝试像这样向 coinbase api 发出请求 $url = "https://api.gdax.com/products/BTC-USD/candles?start=".date($form
ios - NSHTTPURLResponse 变成 NSURLResponse
我在 HTTP header 中使用 if-modified-since 来决定是否应该下载文件。应用程序已经过测试，一切正常，但现在当我询问我的 NSHTTPURLResponse 实例 respo
c# - 0xFF 变成 0x3F
我向串口发送0xFF 结果是 0x3F。所有其他字节都是正确的。情况是这样的…… 外部盒子将这些字节发送到 PC... 0xFF, 0x0D, 0x00, 0x30, 0x31, 0x53, 0x
reactjs - href 变成 data-href
所以我在我的 Next JS 应用程序中遇到了这个奇怪的问题，我导入了谷歌字体，如下所示在我的浏览器中显示的不是 href，而是 data-href="..."，所以问题是谷歌无法将此识别为链接
qt - 将旋转的 QRect 变成 QPolygon
我试图通过将 QRect 变成 QPolygon 来检查 QPolygon 和 QRect 之间的碰撞。但是，矩形也可能有我添加的旋转，所以我想知道如何将 QRect 变成 QPolygon 并考虑到
haskell - 如何将 Sink 变成 Conduit？
我正在尝试写一个 Conduit使用 attoparsec解析器。具体来说，给定 parseOne :: Parser T , 我想构建一个 Conduit ByteString m T重复地将解析器
html - 停止变成 标记内的超链接我正在尝试添加和所以实际的文字出现在我的页面上。不是链接。所以我希望在我的页面上显示实际的 HTML，如下所示: 目前，出现了一个死图像...我想单独阻止了这一点，只是显示了普通的html？


								
								 
									security - 如何防止基于UDP的 channel 变成 “backdoor”？
									最近发现一些路由器设备包含后门，some of which可以通过单个UDP数据包加以利用。我意识到其中一些后门不一定是恶意的，因为我在自己的产品中也做了同样的事情以进行故障排除:打开套接字将心跳数据
								
								
								 
									macos - 将 Macbook 变成 iBeacon
									我知道我可以将 iOS 设备变成 iBeacons ( Can an iOS7 device act as an iBeacon? )。不幸的是，我只有一台设备，我的信标还没有到达。所以我想知道如何将
								
								
								 
									macos - 如何将 Mac 变成 iBeacon
									有没有人尝试过将 MAC 变成 iBeacon。我已经为 iOS 设备完成了此操作，并且想要一个类似的带有一些 UI 的 MAC 独立应用程序。我听说 Mavericks 上的新 API 支持 iBe


			
				
					
						
					
					
						行者123
					
					
						个人简介
						
							我是一名优秀的程序员,十分优秀！
						
					
					
				
				
					
					
						
							作者热门文章
						
						
							
								html - 出于某种原因，IE8 对我的 Sass 文件中继承的 html5 CSS 不友好？
							
								JMeter 在响应断言中使用 span 标签的问题
							
								html - 在 :hover and :active? 上具有不同效果的 CSS 动画
							
								html - 相对于居中的 html 内容固定的 CSS 重复背景？
							
						
					
					
			
			
			
					
						
							滴滴打车优惠券免费领取
						
						
					
					
					
					
					
				
					
					
						
							全站热门文章
						
						
							
								巧用mask属性创建一个纯CSS图标库
							
								Java代码覆盖率super-jacoco
							
								armmattermost
							
								Vulnhub经典靶机：from_sqli_to_shell_i386入门靶机
							
								【Rive】波动文字
							
								Vscode实现应用qss样式表
							
								DocforDevNow
							
								干掉EasyExcel！FastExcel初体验
							
								爬虫自动化脚本+AI赋能
							
								快手后端面试，被面试官秒挂了！


	

	



		
			Copyright 2021 - 2024 cfsdn All Rights Reserved   蜀ICP备2022000587号
			

			广告合作:1813099741@qq.com
			6ren.com

首页

博学

6Ren·AI

商城

python - 这个递归函数能否变成具有类似性能的迭代函数？