gpt4 book ai didi

Python 正则表达式 - 从 orgmode 文件中获取项目

转载 作者:太空宇宙 更新时间:2023-11-04 02:55:25 25 4
gpt4 key购买 nike

我有以下组织模式语法:

** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2

我想提取项目,例如:

 getitems "Hardware"

我应该得到:

  - [ ] adapt a programmable motor to a tripod to be used for panning  

如果我要求“阅读 - 健康”,我应该得到:

 - [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2

我正在使用以下模式:

   pattern = re.compile("\*\* "+ head + " (.+?)\*?$", re.DOTALL)

请求“Reading - Technology”时的输出是:

  - [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2

我也试过:

   pattern = re.compile("\*\* "+ head + " (.+?)[\*|\z]", re.DOTALL)

最后一个适用于除最后一个之外的所有标题。

请求“Reading - Health”时的输出:

 - [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett

如您所见,它与最后一行不匹配。

我正在使用 python 2.7 和 findall。

最佳答案

你可以用

import re

string = """
** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
"""

def getitems(section):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string)
return items.group('block')
except:
return None

items = getitems('Reading - Technology')
print(items)

查看working on ideone.com .


代码的核心是(浓缩)表达式:

^\*{2}.+[\n\r]       # match the beginning of the line, followed by two stars, anything else in between and a newline
(?P<block> # open group "block"
(?: # non-capturing group
(?!^\*{2}) # a neg. lookahead, making sure no ** follows at the beginning of a line
[\s\S] # any character...
)+ # ...at least once
) # close group "block"

** 之后插入搜索字符串的位置在实际代码中。查看 Reading - Technology 的演示在 regex101.com


作为后续行动,您也可以只返回选定的值,如下所示:

def getitems(section, selected=None):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string).group('block')
if selected:
rxi = re.compile(r'^ - \[X\]\ (.+)', re.MULTILINE)
try:
selected_items = rxi.findall(items)
return selected_items
except:
return None
return items
except:
return None

items = getitems('Reading - Health', selected=True)
print(items)

关于Python 正则表达式 - 从 orgmode 文件中获取项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42542063/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com