gpt4 book ai didi

python - 如何生成此自定义字母数字序列?

转载 作者:行者123 更新时间:2023-11-28 17:30:34 26 4
gpt4 key购买 nike

我想创建一个程序来生成一个特定的 7 长字符串。

它必须遵循以下规则:

  1. 0-9在a-z之前,在A-Z之前

  2. 长度为 7 个字符。

  3. 每个字符必须与两个关闭字符不同(不允许示例'NN')

  4. 我需要从 0000000 到 ZZZZZZZ 递增的所有可能组合,但不是随机序列

我已经用这段代码完成了:

from string import digits, ascii_uppercase, ascii_lowercase
from itertools import product

chars = digits + ascii_lowercase + ascii_uppercase

for n in range(7, 8):
for comb in product(chars, repeat=n):
if (comb[6] != comb[5] and comb[5] != comb[4] and comb[4] != comb[3] and comb[3] != comb[2] and comb[2] != comb[1] and comb[1] != comb[0]):
print ''.join(comb)

但它根本不高效,因为我必须等待很长时间才能进行下一次组合。

有人可以帮助我吗?

最佳答案

编辑:我已将解决方案更新为使用长度大于 4 的缓存短序列。这显着加快了计算速度。使用简单版本,生成所有长度为 7 的序列需要 18.5 小时,但使用新方法只需 4.5 小时。

我会让文档字符串完成所有描述解决方案的谈话。

"""
Problem:
Generate a string of N characters that only contains alphanumerical
characters. The following restrictions apply:
* 0-9 must come before a-z, which must come before A-Z
* it's valid to not have any digits or letters in a sequence
* no neighbouring characters can be the same
* the sequences must be in an order as if the string is base62, e.g.,
01010...01019, 0101a...0101z, 0101A...0101Z, 01020...etc

Solution:
Implement a recursive approach which discards invalid trees. For example,
for "---" start with "0--" and recurse. Try "00-", but discard it for
"01-". The first and last sequences would then be "010" and "ZYZ".

If the previous character in the sequence is a lowercase letter, such as
in "02f-", shrink the pool of available characters to a-zA-Z. Similarly,
for "9gB-", we should only be working with A-Z.

The input also allows to define a specific sequence to start from. For
example, for "abGH", each character will have access to a limited set of
its pool. In this case, the last letter can iterate from H to Z, at which
point it'll be free to iterate its whole character pool next time around.

When specifying a starting sequence, if it doesn't have enough characters
compared to `length`, it will be padded to the right with characters free
to explore their character pool. For example, for length 4, the starting
sequence "29" will be transformed to "29 ", where we will deal with two
restricted characters temporarily.

For long lengths the function internally calls a routine which relies on
fewer recursions and cached results. Length 4 has been chosen as optimal
in terms of precomputing time and memory demands. Briefly, the sequence is
broken into a remainder and chunks of 4. For each preceeding valid
subsequence, all valid following subsequences are fetched. For example, a
sequence of six would be split into "--|----" and for "fB|----" all
subsequences of 4 starting A, C, D, etc would be produced.

Examples:
>>> for i, x in enumerate(generate_sequences(7)):
... print i, x
0, 0101010
1, 0101012
etc

>>> for i, x in enumerate(generate_sequences(7, '012abcAB')):
... print i, x
0, 012abcAB
1, 012abcAC
etc

>>> for i, x in enumerate(generate_sequences(7, 'aB')):
... print i, x
0, aBABABA
1, aBABABC
etc
"""

import string

ALLOWED_CHARS = (string.digits + string.ascii_letters,
string.ascii_letters,
string.ascii_uppercase,
)
CACHE_LEN = 4

def _generate_sequences(length, sequence, previous=''):
char_set = ALLOWED_CHARS[previous.isalpha() * (2 - previous.islower())]
if sequence[-length] != ' ':
char_set = char_set[char_set.find(sequence[-length]):]
sequence[-length] = ' '
char_set = char_set.replace(previous, '')

if length == 1:
for char in char_set:
yield char
else:
for char in char_set:
for seq in _generate_sequences(length-1, sequence, char):
yield char + seq

def _generate_sequences_cache(length, sequence, cache, previous=''):
sublength = length if length == CACHE_LEN else min(CACHE_LEN, length-CACHE_LEN)
subseq = cache[sublength != CACHE_LEN]
char_set = ALLOWED_CHARS[previous.isalpha() * (2 - previous.islower())]
if sequence[-length] != ' ':
char_set = char_set[char_set.find(sequence[-length]):]
index = len(sequence) - length
subseq0 = ''.join(sequence[index:index+sublength]).strip()
sequence[index:index+sublength] = [' '] * sublength
if len(subseq0) > 1:
subseq[char_set[0]] = tuple(
s for s in subseq[char_set[0]] if s.startswith(subseq0))
char_set = char_set.replace(previous, '')

if length == CACHE_LEN:
for char in char_set:
for seq in subseq[char]:
yield seq
else:
for char in char_set:
for seq1 in subseq[char]:
for seq2 in _generate_sequences_cache(
length-sublength, sequence, cache, seq1[-1]):
yield seq1 + seq2

def precompute(length):
char_set = ALLOWED_CHARS[0]
if length > 1:
sequence = [' '] * length
result = {}
for char in char_set:
result[char] = tuple(char + seq for seq in _generate_sequences(
length-1, sequence, char))
else:
result = {char: tuple(char) for char in ALLOWED_CHARS[0]}
return result

def generate_sequences(length, sequence=''):
# -------------------------------------------------------------------------
# Error checking: consistency of the value/type of the arguments
if not isinstance(length, int):
msg = 'The sequence length must be an integer: {}'
raise TypeError(msg.format(type(length)))
if length < 0:
msg = 'The sequence length must be greater or equal than 0: {}'
raise ValueError(msg.format(length))
if not isinstance(sequence, str):
msg = 'The sequence must be a string: {}'
raise TypeError(msg.format(type(sequence)))
if len(sequence) > length:
msg = 'The sequence has length greater than {}'
raise ValueError(msg.format(length))
# -------------------------------------------------------------------------
if not length:
yield ''
else:
# ---------------------------------------------------------------------
# Error checking: the starting sequence, if provided, must be valid
if any(s not in ALLOWED_CHARS[0]+' ' for s in sequence):
msg = 'The sequence contains invalid characters: {}'
raise ValueError(msg.format(sequence))
if sequence.strip() != sequence.replace(' ', ''):
msg = 'Uninitiated characters in the middle of the sequence: {}'
raise ValueError(msg.format(sequence.strip()))
sequence = sequence.strip()
if any(a == b for a, b in zip(sequence[:-1], sequence[1:])):
msg = 'No neighbours must be the same character: {}'
raise ValueError(msg.format(sequence))
char_type = [s.isalpha() * (2 - s.islower()) for s in sequence]
if char_type != sorted(char_type):
msg = '0-9 must come before a-z, which must come before A-Z: {}'
raise ValueError(msg.format(sequence))
# ---------------------------------------------------------------------
sequence = list(sequence.ljust(length))
if length <= CACHE_LEN:
for s in _generate_sequences(length, sequence):
yield s
else:
remainder = length % CACHE_LEN
if not remainder:
cache = tuple((precompute(CACHE_LEN),))
else:
cache = tuple((precompute(CACHE_LEN), precompute(remainder)))
for s in _generate_sequences_cache(length, sequence, cache):
yield s

我在 generate_sequences() 函数中包含了彻底的错误检查。为了简洁起见,如果您可以保证调用该函数的人永远不会使用无效输入,则可以删除它们。具体来说,无效的起始序列。

统计特定长度的序列数

虽然该函数将按顺序生成序列,但我们可以执行一个简单的组合计算来计算总共存在多少个有效序列。

序列可以有效地分解为 3 个独立的子序列。一般来说,一个序列可以包含 0 到 7 个数字,后跟 0 到 7 个小写字母,然后是 0 到 7 个大写字母。只要这些总和为 7。这意味着我们可以拥有分区 (1, 3, 3),或 (2, 1, 3),或 (6, 0, 1) 等。我们可以使用 stars and bars计算将 N 的总和拆分为 k 个 bin 的各种组合。已经有一个 python 的实现,我们将借用它。前几个分区是:

[0, 0, 7]
[0, 1, 6]
[0, 2, 5]
[0, 3, 4]
[0, 4, 3]
[0, 5, 2]
[0, 6, 1]
...

接下来,我们需要计算一个分区中有多少个有效序列。由于数字子序列独立于小写字母,小写字母又独立于大写字母,我们可以单独计算它们并将它们相乘。

那么,对于 4 的长度,我们可以有多少种数字组合?第一个字符可以是 10 个数字中的任何一个,但第二个字符只有 9 个选项(十个减去前一个字符的数字)。第三个字母也类似,依此类推。所以有效子序列的总数是10*9*9*9。同样,对于长度为 3 的字母,我们得到 26*25*25。总的来说,对于分区,比如 (2, 3, 2),我们有 10*9*26*25*25*26*25 = 950625000 种组合。

import itertools as it

def partitions(n, k):
for c in it.combinations(xrange(n+k-1), k-1):
yield [b-a-1 for a, b in zip((-1,)+c, c+(n+k-1,))]

def count_subsequences(pool, length):
if length < 2:
return pool**length
return pool * (pool-1)**(length-1)

def count_sequences(length):
counts = [[count_subsequences(i, j) for j in xrange(length+1)] \
for i in [10, 26]]

print 'Partition {:>18}'.format('Sequence count')

total = 0
for a, b, c in partitions(length, 3):
subtotal = counts[0][a] * counts[1][b] * counts[1][c]
total += subtotal
print '{} {:18}'.format((a, b, c), subtotal)
print '\nTOTAL {:22}'.format(total)

总的来说,我们观察到虽然快速生成序列不是问题,但数量太多可能需要很长时间。长度 7 有 78550354750(785 亿)个有效序列,并且随着长度的增加,这个数字大约只增加了 25 倍。

关于python - 如何生成此自定义字母数字序列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34669372/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com