gpt4 book ai didi

python - 重新排列文本 block ,使每个文本 block 都以完整的句子结尾

转载 作者:太空宇宙 更新时间:2023-11-04 02:04:51 24 4
gpt4 key购买 nike

我有三组文本 block (实际上还有更多...),它们显示了完整文本的一部分。然而,原始文本的分割没有正确完成,因为一些句子被分成两个文本 block 。

text1 = {"We will talk about data about model specification parameter \
estimation and model application and the context where we will apply \
the simple example.Is an application where we would like to analyze \
the market for electric cars because"};

text2 = {"we are interested in the market of electric cars.The choice \
that we are interested in is the choice of each individual to \
purchase an electric car or not And we will see how"};

text3 = {"to address this question. Furthermore, it needs to be noted that this is only a model text and there is no content associated with it. "};

例如 text2 以“我们对电动汽车市场感兴趣”开头。这是一个不完整的第一句话,实际上是从文本 block 1 开始的(请参阅那里的最后一句话)。

我想确保每个文本 block 都以一个完整的句子结尾。所以我想把不完整的第一句话移到最后一个文本 block 。例如这里,结果将是:

 text1corr = {"We will talk about data about model specification parameter \
estimation and model application and the context where we will apply \
the simple example.Is an application where we would like to analyze \
the market for electric cars because we are interested in the market of electric cars."};

text2corr = {"The choice that we are interested in is the choice of each individual to purchase an electric car or not And we will see how to address this question."};

text3corr = {"Furthermore, it needs to be noted that this is only a model text and there is no content associated with it. "};

我如何在 Python 中完成它?这甚至可能吗?

最佳答案

您可以使用函数 zip_longest() 来迭代字符串对:

from itertools import zip_longest
import re

l = [text1, text2, text3]
new_l = []

for i, j in zip_longest(l, l[1:], fillvalue=''):
# remove leading and trailing spaces
i, j = i.strip(), j.strip()
# remove leading half sentence
if i[0].islower():
i = re.split(r'[.?!]', i, 1)[-1].lstrip()
# append half sentence from next string
if i[-1].isalpha():
j = re.split(r'[.?!]', j, 1)[0]
i = f"{i} {j}."
new_l.append(i)

for i in new_l:
print(i)

输出:

We will talk about data about model specification parameter estimation and model application and the context where we will apply the simple example.Is an application where we would like to analyze the market for electric cars because we are interested in the market of electric cars.
The choice that we are interested in is the choice of each individual to purchase an electric car or not And we will see how to address this question.
Furthermore, it needs to be noted that this is only a model text and there is no content associated with it.

关于python - 重新排列文本 block ,使每个文本 block 都以完整的句子结尾,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54922364/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com