gpt4 book ai didi

python - 插入排序实现不正确,还是时间复杂度理解不正确?

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:39:40 26 4
gpt4 key购买 nike

我一直在写一些python脚本来测试和计时各种常见的算法,纯粹是为了自己的熏陶。我敢肯定已经有资源可以做到这一点,但我发现自己编写它们很有帮助。

我编写了一个实现冒泡排序、选择排序和插入排序的脚本,并且针对每个脚本运行 10 次不同数组大小的迭代,以及最佳/最差/平均数组顺序情况。

在大多数情况下,我看到了我所期望的结果,例如选择排序总是花费相同的时间,而不管数组的顺序如何,而冒泡排序的表现却如预期的那样糟糕。我还看到插入排序的性能确实随着给定数组顺序的提高而提高,但是我对选择和插入的比较感到困惑。

我知道这两种算法的最坏情况时间复杂度都是 O(n^2),而且插入排序的平均时间复杂度优于选择排序,但我发现在很多情况下,插入排序的性能比选择排序差,这对我来说似乎是不正确的。我希望两者在最坏的情况下表现相同,而插入排序在不是最坏的情况下表现会更好。我是不是误解了如何解释这些结果,还是我在执行这两种算法时犯了错误?

这是我的脚本:

import random
import time
import sys
from enum import Enum

class Case(Enum):
BEST = 1
WORST = 2
AVERAGE = 3

def bubble_sort(arr):
sorted = False
while not sorted:
sorted = True
for i in range(0, len(arr)):
# n
if i + 1 < len(arr) and arr[i] > arr[i + 1]:
scratch = arr[i]
arr[i] = arr[i + 1]
arr[i + 1] = scratch
sorted = False
return arr

def selection_sort(arr):
for i in range(0, len(arr)):
# n
min_index = i
for j in range(i + 1, len(arr)):
# n
if arr[j] < arr[min_index]:
min_index = j
scratch = arr[i]
arr[i] = arr[min_index]
arr[min_index] = scratch
return arr

def insertion_sort(arr):
for i in range(1, len(arr)):
# n
index = i
while index > 0 and arr[index - 1] > arr[index]:
# worst case n, best case 1
scratch = arr[index]
arr[index] = arr[index - 1]
arr[index - 1] = scratch
index -= 1
return arr

TOTAL_RUNS = 10

def verify(algorithm, name):

# first let's test that it actually sorts correctly
arr = list(range(1, 20))
random.shuffle(arr)
arr = algorithm(arr)
for i in range(0, len(arr) - 1):
if arr[i] > arr[i + 1]:
raise Exception("NOT SORTED!")

print("timing " + name + " sort...")

def time_the_algorithm(algorithm, case):
total = 0
min = sys.maxsize
max = 0

sizes = [1000,5000,10000]

for size in sizes:
for i in range(0, TOTAL_RUNS):

arr = list(range(1, size))
if case == Case.WORST:
# for worst case, reverse entire array
arr = list(reversed(arr))
elif case == Case.AVERAGE:
# average case, random order
random.shuffle(arr)

start = time.time()
arr = algorithm(arr)
end = time.time()

elapsed = end - start
total += elapsed
if elapsed > max:
max = elapsed
if elapsed <= min:
min = elapsed

print(name + ", n={0:} - ".format(size) + str(case) + ": avg {0:.2f}s, min {1:.2f}s, max {2:.2f}s".format(total/TOTAL_RUNS, min, max))

# worst case
time_the_algorithm(algorithm, Case.WORST)
# avg case
time_the_algorithm(algorithm, Case.AVERAGE)
# best case
time_the_algorithm(algorithm, Case.BEST)


verify(insertion_sort, "insertion")
verify(selection_sort, "selection")
verify(bubble_sort, "bubble")

这是我的输出:

timing insertion sort...
insertion, n=1000 - Case.WORST: avg 0.06s, min 0.06s, max 0.06s
insertion, n=5000 - Case.WORST: avg 1.42s, min 0.06s, max 1.46s
insertion, n=10000 - Case.WORST: avg 6.90s, min 0.06s, max 5.70s
insertion, n=1000 - Case.AVERAGE: avg 0.03s, min 0.03s, max 0.03s
insertion, n=5000 - Case.AVERAGE: avg 0.71s, min 0.03s, max 0.70s
insertion, n=10000 - Case.AVERAGE: avg 3.44s, min 0.03s, max 2.76s
insertion, n=1000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s
insertion, n=5000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s
insertion, n=10000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s
timing selection sort...
selection, n=1000 - Case.WORST: avg 0.02s, min 0.02s, max 0.02s
selection, n=5000 - Case.WORST: avg 0.43s, min 0.02s, max 0.43s
selection, n=10000 - Case.WORST: avg 2.17s, min 0.02s, max 1.84s
selection, n=1000 - Case.AVERAGE: avg 0.01s, min 0.01s, max 0.02s
selection, n=5000 - Case.AVERAGE: avg 0.43s, min 0.01s, max 0.44s
selection, n=10000 - Case.AVERAGE: avg 2.30s, min 0.01s, max 1.93s
selection, n=1000 - Case.BEST: avg 0.01s, min 0.01s, max 0.02s
selection, n=5000 - Case.BEST: avg 0.42s, min 0.01s, max 0.41s
selection, n=10000 - Case.BEST: avg 2.26s, min 0.01s, max 1.92s
timing bubble sort...
bubble, n=1000 - Case.WORST: avg 0.11s, min 0.11s, max 0.11s
bubble, n=5000 - Case.WORST: avg 3.15s, min 0.11s, max 3.24s
bubble, n=10000 - Case.WORST: avg 15.09s, min 0.11s, max 13.66s
bubble, n=1000 - Case.AVERAGE: avg 0.09s, min 0.09s, max 0.10s
bubble, n=5000 - Case.AVERAGE: avg 2.62s, min 0.09s, max 2.63s
bubble, n=10000 - Case.AVERAGE: avg 12.53s, min 0.09s, max 10.90s
bubble, n=1000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s
bubble, n=5000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s
bubble, n=10000 - Case.BEST: avg 0.00s, min 0.00s, max 0.00s

编辑:

我听取了@asaf-rosemarin 的建议,并尝试用 for 循环替换 while 循环,看看这是否会使时间更均匀,但它似乎根本不会影响性能

def insertion_sort(arr):
for i in range(1, len(arr)):
# n
for j in range(i, 0, -1):
# worst case n, best case 1
if arr[j - 1] > arr[j]:
scratch = arr[j]
arr[j] = arr[j - 1]
arr[j - 1] = scratch
else:
break
return arr

输出:

timing insertion sort...
insertion, n=1000 - Case.AVERAGE: avg 0.03s, min 0.03s, max 0.03s
insertion, n=5000 - Case.AVERAGE: avg 0.72s, min 0.03s, max 0.74s
insertion, n=10000 - Case.AVERAGE: avg 3.61s, min 0.03s, max 3.13s
timing selection sort...
selection, n=1000 - Case.AVERAGE: avg 0.02s, min 0.02s, max 0.02s
selection, n=5000 - Case.AVERAGE: avg 0.47s, min 0.02s, max 0.51s
selection, n=10000 - Case.AVERAGE: avg 2.52s, min 0.02s, max 2.17s
timing bubble sort...
bubble, n=1000 - Case.AVERAGE: avg 0.10s, min 0.09s, max 0.10s
bubble, n=5000 - Case.AVERAGE: avg 2.56s, min 0.09s, max 2.50s
bubble, n=10000 - Case.AVERAGE: avg 12.31s, min 0.09s, max 10.34s

最佳答案

你对时间复杂度的理解是正确的,我在你的实现中没有发现任何错误,所以我猜测原因是for ... in rangewhile 快在 python 中循环。
(更多信息在这里Why is looping over range() in Python faster than using a while loop?)

编辑:

时间复杂度之间的比较与实现的实际运行时间之间的比较之间存在这种不一致的原因是时间复杂度只考虑比较的数量,而忽略了额外的操作开销(因为它是 O(1) 对于每个比较),但那些额外的操作及其实现方式(例如编译与解释、缓存友好性)可能会显着影响运行时间。

关于python - 插入排序实现不正确,还是时间复杂度理解不正确?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54620346/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com