gpt4 book ai didi

python - Python 正则表达式 findall 的一些问题

转载 作者:太空宇宙 更新时间:2023-11-03 18:13:55 25 4
gpt4 key购买 nike

获取字符串源:

string ="""
html,,
head,, profile http://gmpg.org/xfn/11 ,,
lang en-US ,,

title,, Some markright page.
,,title
,,head
"""

...必须解析为 html :

<html>
<head profile="http://gmpg.org/xfn/11" lang="en-US">
<title>Some markright page</title>
</head>

我想用一个 re.findall 传递来解析它,如下所示:

tagList = re.findall( 
r'\s*([A-Z]?[a-z]+[0-9]?,,){1}' # Opening tag - has to be one
r'(.* ,,)*' # Attributes - could be more than one
r'(.*)?' # Content - could be one
r'(\s+,,[a-z]+[0-9]?)?' # Ending tag - could be one
, string )#, flags=re.S ) # can't make any use of DOTALL flag

for t in tagList :
n=0
for s in t :
n+=1
print "String group No:"+str(n)+" -> ", s.strip()
print "_"*10

...但只得到:

String group No:1 ->  html,,
String group No:2 ->
String group No:3 ->
String group No:4 ->
__________
String group No:1 -> head,,
String group No:2 -> profile http://gmpg.org/xfn/11 ,,
String group No:3 ->
String group No:4 ->
__________
String group No:1 -> title,,
String group No:2 ->
String group No:3 -> Some markright page.
String group No:4 -> ,,title

请记住,我必须制作自己的解析器,上面提到的问题只是此标记超集的一个应用程序,因此如果您可以并且愿意,请提供帮助。谢谢。

最佳答案

这就是我要做的事情:

#!/usr/bin/python
import re

pat = re.compile(r'''
(?P<open> \b [^\W_]+ ) ,, |
,, (?P<close> [^\W_]+ ) \b |
(?P<attrName> \S+ ) [ ] (?P<attrValue> [^,\n]+ ) [ ] ,, |
(?P<textContent> [^,\s] (?: [^,] | , (?!,) )*? ) \s* (?=[^\W_]*,,)''',
re.X)

txt = '''html,,
head,, profile http://gmpg.org/xfn/11 ,,
lang en-US ,,

title,, Some markright page.
,,title
,,head'''

result = ''
opened = False
for m in pat.finditer(txt):
if m.group('attrName'):
result += ' ' + m.group('attrName') + '="' + m.group('attrValue') + '"'
else:
if opened:
opened = False
result += '>'
if m.group('open'):
result += '<' + m.group('open')
opened = True
elif m.group('close'):
result += '</' + m.group('close') + '>'
else:
result += m.group('textContent')
print result

注意:我假设文本内容始终包含在标签之间。

关于python - Python 正则表达式 findall 的一些问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25327560/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com