gpt4 book ai didi

python - 使用 Python RegEx re.findall 解析文本

转载 作者:太空宇宙 更新时间:2023-11-04 10:13:21 26 4
gpt4 key购买 nike

我有一个很长的字符串,我需要分组解析,但需要更多地控制它。

import re

RAW_Data = "Name Multiple Words Testing With 1234 Numbers and this stuff* ((Bla Bla Bla (Bla Bla) A40 & A41)) Name Multiple Words Testing With 3456 Numbers and this stuff2* ((Bla Bla Bla (Bla Bla) A42 & A43)) Name Multiple Words Testing With 78910 Numbers and this stuff3* ((Bla Bla Bla (Bla Bla) A44 & A45)) Name Multiple Words Testing With 1234 Numbers and this stuff4* ((Bla Bla Bla (Bla Bla) A46 & A47)) Name Multiple Words Testing With 1234 Numbers and this stuff5* ((Bla Bla Bla (Bla Bla) A48 & A49)) Name Multiple Words Testing With 1234 Numbers and this stuff6* ((Bla Bla Bla (Bla Bla) A50 & A51)) Name Multiple Words Testing With 1234 Numbers and this stuff7* ((Bla Bla Bla (Bla Bla) A52 & A53)) Name Multiple Words Testing With 1234 Numbers and this stuff8* ((Bla Bla Bla (Bla Bla) A54 & A55)) Name Multiple Words Testing With 1234 Numbers and this stuff9* ((Bla Bla Bla (Bla Bla) A56 & A57)) Name Multiple Words Testing With 1234 Numbers and this stuff10* ((Bla Bla Bla (Bla Bla) A58 & A59)) Name Multiple Words Testing With 1234 Numbers and this stuff11* ((Bla Bla Bla (Bla Bla) A60 & A61)) Name Multiple Words Testing With 1234 Numbers and this stuff12* ((Bla Bla Bla (Bla Bla) A62 & A63)) Name Multiple Words Testing With 1234 Numbers and this stuff13* ((Bla Bla Bla (Bla Bla) A64 & A65)) Name Multiple Words Testing With 1234 Numbers and this stuff14* ((Bla Bla Bla (Bla Bla) A66 & A67)) Name Multiple Words Testing With 1234 Numbers and this stuff15* ((Bla Bla Bla (Bla Bla) A68 & A69)) Name Multiple Words Testing With 1234 Numbers and this stuff16*"

fromnode = re.findall('(.*?)(?=\*\s)', RAW_Data)

print fromnode

del fromnode
del RAW_Data

结果是:'用 1234 个数字和这个东西命名多个单词测试', '', '((Bla Bla Bla (Bla Bla) A40 & A41)) 用 3456 个数字和这个东西命名多个单词测试2' < em>........等等。

我似乎无法只捕获像“用 3456 个数字和这些东西命名多个单词测试”这样的字符串,而忽略像“((Bla Bla Bla (Bla Bla) A40 & A41))”这样的所有字符串。任何帮助将不胜感激。

最佳答案

你可以拆分

r'\*\s*\({2}.*?\){2}\s*'

模式 ( see demo ) 匹配:

  • \* - 文字星号
  • \s* - 零个或多个空格
  • \({2} - 恰好 2 个左括号
  • .*? - 除换行符外的零个或多个字符(注意:如果需要跨多行匹配,请添加 re.S 标志)少至可能到第一个
  • \){2} - 双右括号
  • \s* - 0+ 空格。

另外:same, but unrolled (thus, a bit more efficient) regex:

\*\s*\({2}[^)]*(?:\)(?!\))[^)]*)*\){2}\s*

参见 IDEONE demo:

import re
p = re.compile(r'\*\s*\({2}.*?\){2}\s*')
test_str = "Name Multiple Words Testing With 1234 Numbers and this stuff* ((Bla Bla Bla (Bla Bla) A40 & A41)) Name Multiple Words Testing With 3456 Numbers and this stuff2* ((Bla Bla Bla (Bla Bla) A42 & A43)) Name Multiple Words Testing With 78910 Numbers and this stuff3* ((Bla Bla Bla (Bla Bla) A44 & A45)) Name Multiple Words Testing With 1234 Numbers and this stuff4* ((Bla Bla Bla (Bla Bla) A46 & A47)) Name Multiple Words Testing With 1234 Numbers and this stuff5* ((Bla Bla Bla (Bla Bla) A48 & A49)) Name Multiple Words Testing With 1234 Numbers and this stuff6* ((Bla Bla Bla (Bla Bla) A50 & A51)) Name Multiple Words Testing With 1234 Numbers and this stuff7* ((Bla Bla Bla (Bla Bla) A52 & A53)) Name Multiple Words Testing With 1234 Numbers and this stuff8* ((Bla Bla Bla (Bla Bla) A54 & A55)) Name Multiple Words Testing With 1234 Numbers and this stuff9* ((Bla Bla Bla (Bla Bla) A56 & A57)) Name Multiple Words Testing With 1234 Numbers and this stuff10* ((Bla Bla Bla (Bla Bla) A58 & A59)) Name Multiple Words Testing With 1234 Numbers and this stuff11* ((Bla Bla Bla (Bla Bla) A60 & A61)) Name Multiple Words Testing With 1234 Numbers and this stuff12* ((Bla Bla Bla (Bla Bla) A62 & A63)) Name Multiple Words Testing With 1234 Numbers and this stuff13* ((Bla Bla Bla (Bla Bla) A64 & A65)) Name Multiple Words Testing With 1234 Numbers and this stuff14* ((Bla Bla Bla (Bla Bla) A66 & A67)) Name Multiple Words Testing With 1234 Numbers and this stuff15* ((Bla Bla Bla (Bla Bla) A68 & A69)) Name Multiple Words Testing With 1234 Numbers and this stuff16*"
print(re.split(p, test_str))

更新

用于 re.findall 的正则表达式:

(?:\*\s*\(\([^)]*(?:\)(?!\))[^)]*)*\)\))?\s*([^*]*(?:\*(?!\s*\(\()[^*]*)*)\s*

查看 regex demo

被它的外表吓坏了吗?它只是更简单的 (?:\*\s*\(\(.*?\)\))?\s*(.*?(?=\*\s*(?:\(\(|$))) 的展开版本。

参见 IDEONE demo

关于python - 使用 Python RegEx re.findall 解析文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36924820/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com