gpt4 book ai didi

python - 算法:哪组长度为 N 的图 block 可用于生成最多数量的 Scrabble 有效单词?

转载 作者:行者123 更新时间:2023-12-03 16:53:08 26 4
gpt4 key购买 nike

我正在尝试创建一个函数 best_tiles它接受您手中的瓷砖数量并返回允许您产生最多数量的唯一英语有效单词的瓷砖集,假设您只能使用每个瓷砖一次。
例如,您手中的那组瓷砖(A, B, C)您可以生成单词 CAB、BAC、AB 和 BA(所有这些都是英语单词),因此您可以使用该组拼写 4 个独特的单词。与 (B, A, A) ,你可以拼出 5 个单词:ABA、BAA、AA、AB 和 BA。目标是找到一组字母,使您可以拼出最多数量的英语有效单词(无需替换)。
因此,如果 5 是 N = 3 的任意字母组合可以拼写的最大单词数,则运行 best_tiles( n = 3 )会返回 B, A, A .
我想知道如何有效地实现这一点?我目前的方法在字母数量方面根本不能很好地扩展。
我读了一个词表。在这种情况下,我在这里使用 enable.txt:https://www.wordgamedictionary.com/enable/

import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f:
for values in f:
words.append(list(values.strip().upper()))
我创建了一个函数 word_in_tiles h/t smack89 返回是否可以在给定瓦片集的情况下构造单词:
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in
collections.Counter(word).items())
然后我创建了一个函数 get_all_words它会生成一个列表,其中包含一个人可以从单词列表和拼贴集中拼写的所有可能单词。
def get_all_words(tile_set, words):
# words is a word list
return [i for i in words if word_in_tiles(i, tile_set)]
确定哪个tileset是三个字母的“最佳”的极其天真的方法如下:
我首先创建一个给定长度的所有可能组合的列表。所以对于长度 3,我会这样做:
import string
import itertools

letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))

# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]

sequence_dict = dict()
for i in all_three_letter_combinations:
string_test = "".join(i).upper()
sequence_dict[i] = get_all_words(string_test, three_letter)
然后删除没有长度的值并按结果的长度排序:
res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}

def GetMaxLength(res):
return max((len(v), v, k) for k, v in res.items())[1:]
GetMaxLength(res)
你知道,对于三个字母,产生最多英语有效单词的拼贴集是 T A E可以产生以下词 ['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']我希望能够将其扩展到 N = 15。这样做的最佳程序是什么?

最佳答案

我认为这已经足够了!
这是我在 PyPy 下运行的代码的日志:

0:00:00.000232
E
0:00:00.001251
ER
0:00:00.048733
EAT
0:00:00.208744
ESAT
0:00:00.087425
ESATL
0:00:00.132049
ESARTP
0:00:00.380296
ESARTOP
0:00:01.409129
ESIARTLP
0:00:03.433526
ESIARNTLP
0:00:10.391252
ESIARNTOLP
0:00:25.651012
ESIARNTOLDP
0:00:56.642405
ESIARNTOLCDP
0:01:57.257293
ESIARNTOLCDUP
0:03:55.933906
ESIARNTOLCDUPM
0:07:17.146036
ESIARNTOLCDUPMG
0:10:14.844347
ESIARNTOLCDUPMGH
0:13:34.722600
ESIARNTOLCDEUPMGH
0:18:14.215019
ESIARNTOLCDEUPMGSH
0:22:47.129284
ESIARNTOLCDEUPMGSHB
0:27:56.859511
ESIARNTOLCDEUPMGSHBYK
0:46:20.448502
ESIARNTOLCDEUPMGSHBYAK
0:57:15.213635
ESIARNTOLCDEUPMGSHIBYAT
1:09:55.530180
ESIARNTOLCDEUPMGSHIBYATF
1:18:35.209599
ESIARNTOLCDEUPMGSHIBYATRF
1:21:54.095119
ESIARNTOLCDEUPMGSHIBYATRFV
1:20:16.978411
ESIARNTOLCDEUPMGSHIBYAOTRFV
1:14:24.253660
ESIARNTOLCDEUPMGSHIBYAONTRFV
1:00:37.405571
关键的改进是这些。
  • 我不仅区分字母,而且区分这封信被看到了多少次。因此,我可以接受或继续前进的每封信。这是我在评论 David Eisenstat 的解决方案时想到的一个想法。
  • 从他那里我也得到了一个想法,即修剪不能导致答案的树可以很好地控制问题的发展。
  • 我看到的第一个解决方案只是所有顶部的字母。这开始是一个很好的解决方案,所以尽管它是深度优先的,我们还是会修剪得很好。
  • 我小心地将“用尽的尝试”合并为一个记录。这减少了我们必须丢弃的数据量。

  • 这是代码。
    import os
    import datetime
    path = "enable.txt"
    words = []
    with open(path) as f:
    for values in f:
    words.append(values.strip().upper())

    key_count = {}
    for word in words:
    seen = {}
    for letter in word:
    if letter not in seen:
    seen[letter] = 0
    key = (letter, seen[letter])
    if key not in key_count:
    key_count[key] = 1
    else:
    key_count[key] += 1
    seen[letter] += 1


    KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
    #print(KEYS)
    #print(len(KEYS))
    KEY_POS = {}
    for i in range(len(KEYS)):
    KEY_POS[KEYS[i]] = i

    # Now we will build a trie. Every node has a list of words, and a dictionary
    # from the next letter farther in the trie.
    # BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
    # will be indexes into KEYS. This allows us to use the fact that a second 'e' is
    # unlikely, so we can deal with that efficiently.
    class Trie:
    def __init__(self, path):
    self.words = []
    self.dict = {}
    self.min_pos = -1
    self.max_pos = -1
    self.words = []
    self.count_words = 0
    self.path = path

    def add_word (self, word):
    trie = self

    poses = []
    seen = {}
    for letter in word:
    if letter not in seen:
    seen[letter] = 0
    key = (letter, seen[letter])
    poses.append(KEY_POS[(key)])
    seen[letter] += 1
    sorted_poses = sorted(poses);
    for i in range(len(sorted_poses)):
    trie.count_words += 1
    pos = sorted_poses[i]
    if pos not in trie.dict:
    trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
    if trie.max_pos < pos:
    trie.max_pos = pos
    trie = trie.dict[pos]
    trie.count_words += 1
    trie.words.append(word)


    base_trie = Trie('')
    for word in words:
    base_trie.add_word(word);

    def best_solution (size):
    def solve (subset, pos, best, partial):
    found = sum(x[0] for x in partial)
    upper_bound = sum(x[1] for x in partial)
    if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
    return (found, subset)
    if best < found:
    best = found
    # Figure out our next calculations.
    partial_include = []
    partial_exclude = []
    finalized_found = 0
    for this_found, this_bound, this_trie in partial:
    if this_trie is None:
    # This is a generic record of already emptied tries
    finalized_found += this_found
    elif pos in this_trie.dict:
    include_trie = this_trie.dict[pos]
    partial_include.append((
    this_found + len(include_trie.words),
    include_trie.count_words + this_found,
    include_trie
    ))
    # We included the tally of found words in the previous partial.
    # So do not double-count by including it again
    partial_include.append((
    0,
    this_bound - include_trie.count_words - this_found,
    this_trie
    ))
    partial_exclude.append((
    this_found,
    this_bound - include_trie.count_words,
    this_trie
    ))
    elif this_found == this_bound:
    finalized_found += this_found
    else:
    partial_include.append((
    this_found,
    this_bound,
    this_trie
    ))

    partial_exclude.append((
    this_found,
    this_bound,
    this_trie
    ))
    if 0 < finalized_found:
    partial_include.append(
    (finalized_found, finalized_found, None)
    )
    partial_exclude.append(
    (finalized_found, finalized_found, None)
    )

    found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
    if best < found_include:
    best = found_include
    found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
    if found_include < found_exclude:
    return (found_exclude, subset_exclude)
    else:
    return (found_include, subset_include)


    count, subset = solve([], 0, 0, [(len(base_trie.words), base_trie.count_words, base_trie)])
    return ''.join([KEYS[x][0] for x in subset])

    for i in range(20):
    start = datetime.datetime.now()
    print(best_solution(i))
    print(datetime.datetime.now() - start)

    关于python - 算法:哪组长度为 N 的图 block 可用于生成最多数量的 Scrabble 有效单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64214011/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com