gpt4 book ai didi

python - 从 Python 的 NLTK 中的自定义文本生成随机句子?

转载 作者:太空狗 更新时间:2023-10-29 21:20:30 29 4
gpt4 key购买 nike

我在使用 Python 下的 NLTK 时遇到问题,特别是 .generate() 方法。

generate(self, length=100)

Print random text, generated using a trigram language model.

Parameters:

   * length (int) - The length of text to generate (default=100)

这是我正在尝试的简化版本。

import nltk

words = 'The quick brown fox jumps over the lazy dog'
tokens = nltk.word_tokenize(words)
text = nltk.Text(tokens)
print text.generate(3)

这将总是生成

Building ngram index...
The quick brown
None

与用单词构建随机短语相反。

这是我的输出结果

print text.generate()

Building ngram index...
The quick brown fox jumps over the lazy dog fox jumps over the lazy
dog dog The quick brown fox jumps over the lazy dog dog brown fox
jumps over the lazy dog over the lazy dog The quick brown fox jumps
over the lazy dog fox jumps over the lazy dog lazy dog The quick brown
fox jumps over the lazy dog the lazy dog The quick brown fox jumps
over the lazy dog jumps over the lazy dog over the lazy dog brown fox
jumps over the lazy dog quick brown fox jumps over the lazy dog The
None

再次从相同的文本开始,但随后有所不同。我也试过使用奥威尔 1984 年的第一章。同样,总是 以前 3 个标记开始(在本例中其中一个是空格)然后然后随机生成文本。

我在这里做错了什么?

最佳答案

要生成随机文本,你需要使用 Markov Chains

执行此操作的代码:from here

import random

class Markov(object):

def __init__(self, open_file):
self.cache = {}
self.open_file = open_file
self.words = self.file_to_words()
self.word_size = len(self.words)
self.database()


def file_to_words(self):
self.open_file.seek(0)
data = self.open_file.read()
words = data.split()
return words


def triples(self):
""" Generates triples from the given data string. So if our string were
"What a lovely day", we'd generate (What, a, lovely) and then
(a, lovely, day).
"""

if len(self.words) < 3:
return

for i in range(len(self.words) - 2):
yield (self.words[i], self.words[i+1], self.words[i+2])

def database(self):
for w1, w2, w3 in self.triples():
key = (w1, w2)
if key in self.cache:
self.cache[key].append(w3)
else:
self.cache[key] = [w3]

def generate_markov_text(self, size=25):
seed = random.randint(0, self.word_size-3)
seed_word, next_word = self.words[seed], self.words[seed+1]
w1, w2 = seed_word, next_word
gen_words = []
for i in xrange(size):
gen_words.append(w1)
w1, w2 = w2, random.choice(self.cache[(w1, w2)])
gen_words.append(w2)
return ' '.join(gen_words)

解释:Generating pseudo random text with Markov chains using Python

关于python - 从 Python 的 NLTK 中的自定义文本生成随机句子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1150144/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com