gpt4 book ai didi

Python在一个txt文件中合并行

转载 作者:太空狗 更新时间:2023-10-30 02:44:16 35 4
gpt4 key购买 nike

关于在 txt 文件中合并行的问题。

文件内容如下(电影字幕)。我想将字幕、每个段落中的那些英文单词和句子合并为 1 行,而不是现在分别显示 1、2 或 3 行。

能否请您告知哪种方法在 Python 中可行?非常感谢。

1
00:00:23,343 --> 00:00:25,678
Been a while since I was up here
in front of you.

2
00:00:25,762 --> 00:00:28,847
Maybe I'll do us all a favour
and just stick to the cards.

3
00:00:31,935 --> 00:00:34,603
There's been speculation that I was
involved in the events that occurred
on the freeway and the rooftop...

4
00:00:36,189 --> 00:00:39,233
Sorry, Mr Stark, do you
honestly expect us to believe that

5
00:00:39,317 --> 00:00:42,903
that was a bodyguard
in a suit that conveniently appeared,

6
00:00:42,987 --> 00:00:45,698
despite the fact
that you sorely despise bodyguards?

7
00:00:45,782 --> 00:00:46,907
Yes.

8
00:00:46,991 --> 00:00:51,662
And this mysterious bodyguard
was somehow equipped

最佳答案

直观的解决方案

基于您可以拥有的 4 种线条的简单解决方案:

  • 一个空行
  • 表示位置的数字(无字母)
  • 字幕的时间安排(具有特定模式;无字母)
  • 正文

您可以遍历每一行,对它们进行分类,然后采取相应的行动。

事实上,非文本非空行(时间线和数字)的“ Action ”是相同的。因此:

import re

with open('yourfile.txt') as f:
exampleText = f.read()

new = ''

for line in exampleText.split('\n'):
if line == '':
new += '\n\n'
elif re.search('[a-zA-Z]', line): # check if there is text
new += line + ' '
else:
new += line + '\n'

结果:

>>> print(new)
1
00:00:23,343 --> 00:00:25,678
Been a while since I was up here in front of you.

2
00:00:25,762 --> 00:00:28,847
Maybe I'll do us all a favour and just stick to the cards.
...

正则表达式解释:

  • []表示里面的任意一个字符
  • a-z表示字符a-z的范围
  • A-Z表示字符A-Z的范围

关于Python在一个txt文件中合并行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30251658/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com