gpt4 book ai didi

python - 从 python 3 中的用户输入中计算二元组?

转载 作者:太空宇宙 更新时间:2023-11-03 12:55:05 25 4
gpt4 key购买 nike

我被卡住了,需要一点指导。我正在努力使用 Grok Learning 自己学习 Python。下面是问题和示例输出以及我在代码中的位置。我很感激能帮助我解决这个问题的任何提示。

In linguistics, a bigram is a pair of adjacent words in a sentence. The sentence "The big red ball." has three bigrams: The big, big red, and red ball.

Write a program to read in multiple lines of input from the user, where each line is a space-separated sentence of words. Your program should then count up how many times each of the bigrams occur across all input sentences. The bigrams should be treated in a case insensitive manner by converting the input lines to lowercase. Once the user stops entering input, your program should print out each of the bigrams that appear more than once, along with their corresponding frequencies. For example:

Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line:
near the: 2
red ball: 2
the big: 3
big red: 3

我的代码还没走多远,真的卡住了。但这是我所在的位置:

words = set()
line = input("Line: ")
while line != '':
words.add(line)
line = input("Line: ")

我这样做对吗?尽量不要导入任何模块,只使用内置功能。<​​/p>

谢谢,杰夫

最佳答案

让我们从接收句子(带标点符号)并返回找到的所有小写双字母列表的函数开始。

因此,我们首先需要从句子中去除所有非字母数字,将所有字母转换为对应的小写字母,然后将句子按空格拆分为单词列表:

import re

def bigrams(sentence):
text = re.sub('\W', ' ', sentence.lower())
words = text.split()
return zip(words, words[1:])

我们将使用标准(内置)re用于基于正则表达式用空格替换非字母数字的包,以及用于配对连续单词的内置 zip 函数。 (我们将单词列表与同一个列表配对,但移动了一个元素。)

现在我们可以测试它了:

>>> bigrams("The big red ball")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams("THE big, red, ball.")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams(" THE big,red,ball!!?")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]

接下来,为了计算在每个句子中找到的二元组,您可以使用 collections.Counter .

例如,像这样:

from collections import Counter

counts = Counter()
for line in ["The big red ball", "The big red ball is near the big red box", "I am near the box"]:
counts.update(bigrams(line))

我们得到:

>>> Counter({('the', 'big'): 3, ('big', 'red'): 3, ('red', 'ball'): 2, ('near', 'the'): 2, ('red', 'box'): 1, ('i', 'am'): 1, ('the', 'box'): 1, ('ball', 'is'): 1, ('am', 'near'): 1, ('is', 'near'): 1})

现在我们只需要打印出现不止一次的那些:

for bigr, cnt in counts.items():
if cnt > 1:
print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

全部放在一起,用一个循环供用户输入,而不是固定列表:

import re
from collections import Counter

def bigrams(sentence):
text = re.sub('\W', ' ', sentence.lower())
words = text.split()
return zip(words, words[1:])

counts = Counter()
while True:
line = input("Line: ")
if not line:
break
counts.update(bigrams(line))

for bigr, cnt in counts.items():
if cnt > 1:
print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

输出:

Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line:
near the: 2
red ball: 2
big red: 3
the big: 3

关于python - 从 python 3 中的用户输入中计算二元组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45638131/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com