gpt4 book ai didi

python - 如果文本文件内容中存在几个字符,则获取整个单词

转载 作者:太空宇宙 更新时间:2023-11-04 01:13:23 25 4
gpt4 key购买 nike

我有一个包含 100 多个段落的文本文件。我想查找并列出包含特定字符串的单词。

这是我的文本文件内容:

A computer is a general purpose device that can be programmed to carry out a set of arithmetic or logical operations automatically. Since a sequence of operations can be readily changed, the computer can solve more than one kind of problem.

我想检索包含ra 的单词。它应该返回一般编程操作

这是我的代码:

with open('computer.txt', 'r') as searchfile:
for line in searchfile:
if "ra" in line:
line_split = line.split(' ')
for each in line_split:
if "ra" in each:
print each

执行此操作最有效的方法是什么?

最佳答案

正则表达式在这里可以很好地工作:

>>> import re
>>> r = re.compile(r"\b\w*ra\w*\b")
>>> r.findall("A computer is a general purpose device that can be programmed to carry out a set of arithmetic or logical operations automatically. Since a sequence of operations can be readily changed, the computer can solve more than one kind of problem.")
['general', 'programmed', 'operations', 'operations']

此列表包含可以通过简单的 set() 调用删除的重复项(这反过来会删除元素的顺序,因此如果您需要保留它,则需要做更多的工作).

请注意,正则表达式在将其视为“单词”时相当幼稚:

\b   # Start of an alphanumeric word
\w* # Match any number of word characters [A-Za-z0-9_]
ra # Match ra
\w* # Match any number of word characters
\b # End of a word

关于python - 如果文本文件内容中存在几个字符,则获取整个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26273796/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com