gpt4 book ai didi

python - Python 中字符串的基数排序

转载 作者:行者123 更新时间:2023-12-01 21:43:14 25 4
gpt4 key购买 nike

与 Python 的排序相比,我的基数排序函数输出已排序但错误的列表:

My radix sort: ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
Python's sort: ['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']

* 我的基数排序不做填充
* 其机制是最低有效位(LSB)
* 我需要利用每个单词的长度

以下是我的代码。

def count_sort_letters(array, size, col, base):
output = [0] * size
count = [0] * base
min_base = ord('a')

for item in array:
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
count[letter] += 1

for i in range(base - 1):
count[i + 1] += count[i]

for i in range(size - 1, -1, -1):
item = array[i]
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
output[count[letter] - 1] = item
count[letter] -= 1

return output


def radix_sort_letters(array):
size = len(array)

max_col = len(max(array, key = len))

for col in range(max_col):
array = count_sort_letters(array, size, col, 26)

return array

谁能找到解决这个问题的方法?

最佳答案

正如我在评论中提到的:

In your code the lines:

correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base

Always uses the first letter of the word once col is greater than the word length. This causes shorter words to be sorted based upon their first letter once col is greater than the word length. For instance ['aa', 'a'] remains unchanged since on the for col loop we compare the 'a' in both words, which keeps the results unchanged.

代码修正

注意:已尝试尽量减少对原始代码的更改

def count_sort_letters(array, size, col, base, max_len):
""" Helper routine for performing a count sort based upon column col """
output = [0] * size
count = [0] * (base + 1) # One addition cell to account for dummy letter
min_base = ord('a') - 1 # subtract one too allow for dummy character

for item in array: # generate Counts
# get column letter if within string, else use dummy position of 0
letter = ord(item[col]) - min_base if col < len(item) else 0
count[letter] += 1

for i in range(len(count)-1): # Accumulate counts
count[i + 1] += count[i]

for item in reversed(array):
# Get index of current letter of item at index col in count array
letter = ord(item[col]) - min_base if col < len(item) else 0
output[count[letter] - 1] = item
count[letter] -= 1

return output

def radix_sort_letters(array, max_col = None):
""" Main sorting routine """
if not max_col:
max_col = len(max(array, key = len)) # edit to max length

for col in range(max_col-1, -1, -1): # max_len-1, max_len-2, ...0
array = count_sort_letters(array, len(array), col, 26, max_col)

return array

lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))

测试

lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))

# Compare to Python sort
print(radix_sort_letters(lst)==sorted(lst))

输出

['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']
True

解释

计数排序 是一个 stable sort含义:

让我们通过一个示例来了解该函数的工作原理。

让我们排序:['ac', 'xb', 'ab']

我们以相反的顺序遍历每个列表中的每个字符。

迭代 0:

Key is last character in list (i.e. index -1):       
keys are ['c','b', 'b'] (last characters of 'ac', 'xb', and 'ab'

Peforming a counting sort on these keys we get ['b', 'b', 'c']

This causes the corresponding words for these keys to be placed in
the order: ['xb', 'ab', 'ac']

Entries 'xb' and 'ab' have equal keys (value 'b') so they maintain their
order of 'xb' followed by 'ab' of the original list
(since counting sort is a stable sort)

迭代 1:

Key is next to last character (i.e. index -2):

Keys are ['x', 'a', 'a'] (corresponding to list ['xb', 'ab', 'ac'])

Counting Sort produces the order ['a', 'a', 'a']
which causes the corresponding words to be placed in the order
['ab', 'ac', 'xb'] and we are done.

原始软件错误——您的代码最初是从左到右而不是从右到左遍历字符串。我们需要从右到左,因为我们希望最后一个排序基于第一个字符,倒数第二个基于第二个字符,依此类推。

不同长度的字符串 - 上面的例子是等长字符串。

前面的例子被简化为假设等长字符串。现在让我们尝试不等长的字符串,例如:

['ac', 'a', 'ab']

这立即出现了一个问题,因为单词的长度不相等,我们不能每次都选择一个字母。

我们可以通过用一个虚拟字符(例如“*”)填充每个单词来修复:

['ac', 'a*', 'ab']

迭代 0:键是每个单词的最后一个字符,因此:['c', '*', 'b']

The understanding is that the dummy character is less than all other
characters, so the sort order will be:
['*', 'b', 'c'] causing the related words to be sorted in the order

['a*', 'ab', 'ac']

迭代 1:键位于每个单词中最后一个字符的旁边,因此:['a', 'a', 'a']

 Since the keys are all equal counting sort won't change the order so we keep

['a*', 'ab', 'ac']

Removing the dummy character from each string (if any) we end up with:

['a', 'ab', 'ac']

The idea behind get_index is to mimic the behavior of padding strings without actual padding (i.e. padding is extra work). Thus, based upon the index it evaluates if the index points to the padded or unpadded portion of the string and returns an appropriate index into the counting array for counting.

关于python - Python 中字符串的基数排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60968950/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com