00:02:18,000 I understand how customers d-6ren">
gpt4 book ai didi

python - 字幕重新格式化以完整句子结尾

转载 作者:太空宇宙 更新时间:2023-11-03 11:13:35 26 4
gpt4 key购买 nike

我有以下 srt(字幕)文件:

import pysrt

srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice. So

02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific

04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will

05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.
"""

如您所见,字幕奇怪地分开了。我希望每个字幕都以一个完整的句子结尾,如下所示:

srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice.

02
00:02:19,000 --> 00:02:24,000
So what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping?

04
00:02:29,000 --> 00:02:34,000
What specific product they will purchase and also what is the brand that they will prefer.

05
00:02:34,000 --> 00:02:39,000
And of course many of the choices that are relevant in the context of marketing.
"""

我想知道如何使用 Python 实现这一点。可以使用 pysrt 打开字幕文本:

import pysrt

srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice. So

02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific

04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will

05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing."""


with open("test.srt", "w") as text_file:
text_file.write(srt)

sub = pysrt.open("test.srt")
text = sub.text

**编辑:**

根据@Chris 的回答,我试过:

from operator import itemgetter

srt = """
01
00:02:14,000 --> 00:02:18,000
understand how customers do their choice. So

02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific

04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will

05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.
"""


l = [s.split('\n') for s in srt.strip().split('\n\n')]
whole = ' '.join(map(itemgetter(2), l))
for i, sen in enumerate(re.findall(r'([A-Z][^\.!?]*[\.!?])', whole)):
l[i][2] = sen
print('\n\n'.join('\n'.join(s) for s in l))

但我得到的结果与输入完全相同......

01
00:02:14,000 --> 00:02:18,000
understand how customers do their choice. So

02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific

04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will

05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.

我做错了什么?

最佳答案

这有点困惑,而且容易出错,但按预期工作:

from operator import itemgetter

l = [s.split('\n') for s in srt.strip().split('\n\n')]
whole = ' '.join(map(itemgetter(2), l))
for i, sen in enumerate(re.findall(r'([A-Z][^\.!?]*[\.!?])', whole)):
l[i][2] = sen
print('\n\n'.join('\n'.join(s) for s in l))

输出:

01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice.

02
00:02:19,000 --> 00:02:24,000
So what is the choice of packaging that they prefer when they have to pick up something in a shelf?

03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping?

04
00:02:29,000 --> 00:02:34,000
What specific product they will purchase and also what is the brand that they will prefer.

05
00:02:34,000 --> 00:02:39,000
And of course many of the choices that are relevant in the context of marketing.

正则表达式部分引用:Regex to find all sentences of text?

关于python - 字幕重新格式化以完整句子结尾,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56124718/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com