gpt4 book ai didi

python - 使用 Levenshtein 距离作为 Python 中的启发式算法生成字符串的爬山算法?

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:23:57 26 4
gpt4 key购买 nike

我一直在关注这个ebook我被他们的一个 self 检查问题困住了,问题是这样的:

Self Check

Here’s a self check that really covers everything so far. You may have heard of the infinite monkey theorem? The theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. Well, suppose we replace a monkey with a Python function. How long do you think it would take for a Python function to generate just one sentence of Shakespeare? The sentence we’ll shoot for is: “methinks it is like a weasel”

You’re not going to want to run this one in the browser, so fire up your favorite Python IDE. The way we’ll simulate this is to write a function that generates a string that is 27 characters long by choosing random letters from the 26 letters in the alphabet plus the space. We’ll write another function that will score each generated string by comparing the randomly generated string to the goal.

A third function will repeatedly call generate and score, then if 100% of the letters are correct we are done. If the letters are not correct then we will generate a whole new string.To make it easier to follow your program’s progress this third function should print out the best string generated so far and its score every 1000 tries.

Self Check Challenge

See if you can improve upon the program in the self check by keeping letters that are correct and only modifying one character in the best string so far. This is a type of algorithm in the class of ‘hill climbing’ algorithms, that is we only keep the result if it is better than the previous one.

我编写了一些代码,使用生成的字符串和所需字符串之间的 Levenshtein 距离来完成这个挑战的第一部分。

import random, string, nltk

def string_generator(length, collection):
"""
takes all characters in collection and generates a string of size length.
"""
return ''.join([random.choice(collection) for _ in xrange(length)])

def string_tester(output, text):
"""
compares strings given and returns the Levenshtein distance.
"""
return nltk.metrics.edit_distance(output, text)

if __name__ == '__main__':
collection = [x for x in (string.ascii_lowercase + ' ')]
longest_distance = 27
best_string = None
ctr = 0
while True:
random_string = string_generator(26, collection)
distance = string_tester(random_string, "methinks it is like a weasel")
ctr += 1
ctr %= 1000
if distance < longest_distance:
best_string = random_string
longest_distance = distance
# end if the string generated is same as the one given
if longest_distance == 0:
print best_string
print longest_distance
break
# use the best string to generate a better string every 1000th time
if ctr == 0:
print longest_distance
print best_string
# TODO: optimization here

我不知道如何在 TODO 处生成更好的字符串 - 在迭代和给定方法之前使用最佳字符串。

tl;dr: 我如何编写一个使用编辑距离作为其启发式算法的爬山算法,直到它生成特定的字符串?请概述过程。

最佳答案

target_word = "hello world"
target_size = len(target_word)
alphabet = "abcdefghijklmnopqrstuvwxyz "
def make_guess(alphabet,size):
return "".join(random.choice(alphabet) for _ in range(size))

guess = make_guess(alphabet,target_size)

for i in itertools.count(0):
if guess == target_word:
break;
if not i % 1000:
print "Best So Far:",guess
#climb hill and replace our guess if our next guess is better
guess = min(guess,make_guess(alphabet,target_size),key=lambda _guess:levenstein(_guess,target_word))
print "Final Guess:",guess

这称为爬山,因为如果下一个潜在解决方案更好,则该潜在解决方案会被替换。这可能会导致其他类型的问题陈述出现问题,您会在其中找到表现相对较好的局部最大值或最小值,但您会错过全局最大值或最小值

关于python - 使用 Levenshtein 距离作为 Python 中的启发式算法生成字符串的爬山算法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20689106/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com