gpt4 book ai didi

python - 如何在正则表达式中划分一个换行符和两个换行符?

转载 作者:行者123 更新时间:2023-12-01 08:22:13 25 4
gpt4 key购买 nike

我想通过以下方式对正则表达式的输出进行分组:

  1. 换行符“\n”
  2. 两个换行符“\n\n”

如何分成 2 组以便使用其他正则表达式拆分方法?

查找单独的换行符或我管理的两个换行符。例如:

Facebook and Google exploited a feature__(\n)__  
intended for “enterprise developers” to__(\n)__
distribute apps that collect large amounts__(\n)__
of data on private users, TechCrunch first reported.__(\n\n)__

Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.__(\n)__
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?__(\n\n)__

Some text so on...

我尝试了这段代码:

def find_newlines(file):
with open(file, "r") as content:
text = content.read()
content = re.split("\n+", text)
return content

结果是:

['Apple' , 'Something', 'Enything']

我想要以下输出:

['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.' __,__ 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']

我想要1组换行符和 2 组两个换行符。

最佳答案

您似乎正在尝试将文本分组为由双换行符分隔的两个(或更多) block 。因此,一种方法是首先在 \n\n 上分割文本。这将导致 block 仍包含单个换行符。然后,每个 block 都可以将任何剩余的换行符替换为空格。这一切都可以使用 Python 列表理解来完成,如下所示:

text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.

Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""

content = [block.replace('\n', ' ') for block in text.split('\n\n')]

print(content)

为您提供一个包含两个条目且没有换行符的列表:

['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.', 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
<小时/>

正则表达式可用于 block 由两个或多个空行分隔的情况,如下所示:

import re

text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.



Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""

content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]

print(content)

关于python - 如何在正则表达式中划分一个换行符和两个换行符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54553340/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com