gpt4 book ai didi

Python 如何将带连字符的单词与换行符合并?

转载 作者:太空宇宙 更新时间:2023-11-03 15:55:20 24 4
gpt4 key购买 nike

I want to say that Napp Granade
serves in the spirit of a town in our dis-
trict of Georgia called Andersonville.

我有数千个包含上述数据的文本文件,并且单词已使用连字符和换行符换行。

我想做的是删除连字符并将换行符放在单词的末尾。如果可能的话,我不想删除所有带连字符的单词,只删除那些位于行尾的单词。

            with open(filename, encoding="utf8") as f:
file_str = f.read()


re.sub("\s*-\s*", "", file_str)

with open(filename, "w", encoding="utf8") as f:
f.write(file_str)

上面的代码不起作用,我尝试了几种不同的方法。

我想遍历整个文本文件并删除所有表示换行符的连字符。如:

I want to say that Napp Granade
serves in the spirit of a town in our district
of Georgia called Andersonville.

如有任何帮助,我们将不胜感激。

最佳答案

您不需要使用正则表达式:

filename = 'test.txt'

# I want to say that Napp Granade
# serves in the spirit of a town in our dis-
# trict of Georgia called Anderson-
# ville.

with open(filename, encoding="utf8") as f:
lines = [line.strip('\n') for line in f]
for num, line in enumerate(lines):
if line.endswith('-'):
# the end of the word is at the start of next line
end = lines[num+1].split()[0]
# we remove the - and append the end of the word
lines[num] = line[:-1] + end
# and remove the end of the word and possibly the
# following space from the next line
lines[num+1] = lines[num+1][len(end)+1:]

text = '\n'.join(lines)

with open(filename, "w", encoding="utf8") as f:
f.write(text)


# I want to say that Napp Granade
# serves in the spirit of a town in our district
# of Georgia called Andersonville.

但是你当然可以,而且它更短:

with open(filename, encoding="utf8") as f:
text = f.read()

text = re.sub(r'-\n(\w+ *)', r'\1\n', text)

with open(filename, "w", encoding="utf8") as f:
f.write(text)

我们寻找-后跟\n,并捕获后面的词,也就是分割词的结尾。
我们用捕获的单词和换行符替换所有这些。

不要忘记使用原始字符串进行替换,以便正确解释 \1

关于Python 如何将带连字符的单词与换行符合并?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43666790/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com