gpt4 book ai didi

python - 从一组文本文件中获取随机句子

转载 作者:行者123 更新时间:2023-11-30 22:58:55 47 4
gpt4 key购买 nike

我不是一个程序员,但被要求帮助我的一位教授完成一个需要基本编程知识水平的研究项目。我们从 22 个文本文件中提取 1000 个句子。每个文本文件包含数千个句子,所有句子都串在一个段落中,如下所示:

By this time , the resistance was severe . Why ? Consider the houses that were coming down , to be replaced by modern flats . They are typical of the row houses for workers of the dreary industrial towns of England . These " cottages , " as they were called in Sunderland , were built for the working class by unnamed builders one hundred years ago . Many were rented , but most were by this time owned by their inhabitants . The houses sit directly on the sidewalk . There are no trees on the street . There is no front yard .

太丑了。不管怎样,我不确定如何从这样的文本 block 中提取随机句子。我相当有信心,这些是我的程序必须遵循的步骤:

  1. 从 22 个文本文件集中选择一个随机文本文件。
  2. 从文本文件中随机选择一个句子。
    • 我认为这可以通过选择一个随机数并“计数”句号 (.) 直至该数字,然后选择该句号之后的句子作为随机句子,并在下一个句号处停止来完成。
  3. 将选定的句子写入单独的文本文件。
  4. 再重复 999 次。

这是我到目前为止编写的代码。它不会像我上面考虑的那样“计算”句号,因为我不知道该怎么做,但我尝试开始概述我所知道的基本部分。我知道我的代码相当丑陋,但我以前从未真正编写过任何东西。感谢您的帮助!

fileNumber = 0
sentenceNumber = 0
i = 0
fname = "x"
sentence = "x"

for i < 1000

fileNumber = random.randint(0, 22) #chooses a random number in order to assign one of the text files
sentenceNumber = random.randint(0, NUMBER OF FULLSTOPS) #chooses a random number to select a random sentence from the previously selected text file

if fileNumber == 0 #assigns the file to search based on the random number stored in fileNumber
fname = "w_acad_1990.txt"
else if fileNumber == 1
fname = "w_acad_1991.txt"
else if fileNumber == 2
fname = "w_acad_1992.txt"
else if fileNumber == 3
fname = "w_acad_1993.txt"
else if fileNumber == 4
fname = "w_acad_1994.txt"
else if fileNumber == 5
fname = "w_acad_1995.txt"
else if fileNumber == 6
fname = "w_acad_1996.txt"
else if fileNumber == 7
fname = "w_acad_1997.txt"
else if fileNumber == 8
fname = "w_acad_1998.txt"
else if fileNumber == 9
fname = "w_acad_1999.txt"
else if fileNumber == 10
fname = "w_acad_2000.txt"
else if fileNumber == 11
fname = "w_acad_2001.txt"
else if fileNumber == 12
fname = "w_acad_2002.txt"
else if fileNumber == 13
fname = "w_acad_2003.txt"
else if fileNumber == 14
fname = "w_acad_2004.txt"
else if fileNumber == 15
fname = "w_acad_2005.txt"
else if fileNumber == 16
fname = "w_acad_2006.txt"
else if fileNumber == 17
fname = "w_acad_2007.txt"
else if fileNumber == 18
fname = "w_acad_2008.txt"
else if fileNumber == 19
fname = "w_acad_2009.txt"
else if fileNumber == 20
fname = "w_acad_2010.txt"
else if fileNumber == 21
fname = "w_acad_2011.txt"
else if fileNumber == 22
fname = "w_acad_2012.txt"
else
fname = "x"

#select a random sentence
#write that sentence to a text file

i++

最佳答案

我建议类似的事情:

import re
from random import sample

sentences = []
for i in range(23):
with open('w_acad_{}.txt'.format(i)) as f:
sentences += re.findall(r".*?[\.\!\?]+", f.read())

selected = sample(sentences, 1000)
with open('out.txt', 'w') as f:
f.write(''.join(selected))

首先,如果您在句子之前选择文件,则所有句子之间的概率不相等。在选择之前最好先检索所有句子。

不要打开文件 1000 次。很重!

不要按照其他人的建议使用split,因为您会丢失分隔符。在这里,我使用 re 模块中的 findall ,这样我就可以使用多个分隔符(不仅是 . ,还可以使用 ! 等.)并且当我检索句子时我会保留它。

最后,您可以使用random模块中的sample来选择1000个句子。

关于python - 从一组文本文件中获取随机句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35973422/

47 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com