gpt4 book ai didi

python - 正则表达式 re.sub 关于文件字体问题

转载 作者:行者123 更新时间:2023-12-04 10:41:07 26 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





Is it possible to use PyYAML to read a text file written with a “YAML front matter” block inside?

(2 个回答)


去年关闭。




我试图从一个 .md 文件中检索前面的内容,当我的前面的每个标题都在一行中时,我可以检索内容。

前任:

---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:["process", "todo"]
---

所以我写了下面的python脚本来获取前端内容
def get_front_matter(file, start='---', end='---'):
"""Strip file and retrieve front matter then format the value"""
content = {}
with open(file, 'r', encoding='UTF-8') as file_content:
for content_line in file_content:
if content_line.strip() == start:
break
for content_line in file_content:
if content_line.strip() == end:
break

line_data = content_line.split(':', 1)
# If we cannot split decently, carry on
if len(line_data) != 2:
continue
# format the string to store in dict for better usage
content[line_data[0]] = re.sub(r"[\n\t]*", "", line_data[1]).strip(' "')
return content


但是如果我的前任 status 有多行,我就会面临一个问题。
---
title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:
[
"process",
"todo",
"hold"
]
---

当我尝试阅读上述文件前言时,我得到 status 的空白值,但它应该如下所示:
{'title': 'Meeting', 'date': '2019-03-14T07:51:28+01:00', 'draft': 'false', 'teams': '["process", "todo", "hold"]'}

有没有其他方法可以根据行或标签读取前端内容的内容。我尝试了一些正则表达式,但无法检索一组行。

最佳答案

我把你的代码保存的差不多了,关键是在我们之前不要将值添加到结果中
确保我们收集了完整的 value (当它分成多行时)
,这是通过验证下一个 str 来完成的行,如果它是一个有效值 (key: some value)然后添加之前的 key与其 content结果或如果它是结束字符 --- ,我希望评论让事情更清楚

    def get_front_matter(file, start='---', end='---'):
"""Strip file and retrieve front matter then format the value"""
result = {}
with open(file, 'r', encoding='UTF-8') as file_content:
for content_line in file_content:
if content_line.strip() == start:
break

content = ''
key = ''
for content_line in file_content:
if content_line.strip() == end:
if key and content:
# add last key: content before breaking out
result[key] = re.sub(r"[\n\t]*", "", content).strip (' "')
break

line_data = content_line.split(':', 1)
if len(line_data) == 2 and not content:
# this is our first key: content, in this point we don't have previous content so we should keep them and check the next value first
key = line_data[0]
content = line_data[1]
continue
elif len(line_data) == 2: # we found another valid value
# add previous (key, content) and keep the new (key , content)
result[key] = re.sub(r"[\n\t]*", "", content).strip(' "')
key = line_data[0]
content = line_data[1]
else:
# not a valid key: value add it to previous value because it's a value splited in multiple line
content += content_line

return result

备注 :我用结果更改了内容名称,对于这样的情况,此代码将中断:
     title: "Meeting"
date: 2019-03-14T07:51:28+01:00
draft: false
status:
[
"somevalue:process", # if the value contains ':'
"todo",
"hold"
]

在这里,您没有指定我们如何区分键和包含 ':' 的值,如果它前面没有它的键。我希望这不会使
你的问题

关于python - 正则表达式 re.sub 关于文件字体问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59927959/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com