gpt4 book ai didi

python - 正则表达式不匹配

转载 作者:太空宇宙 更新时间:2023-11-03 13:04:59 24 4
gpt4 key购买 nike

我正在编写一个小的 python 脚本来从数据库中收集一些数据,唯一的问题是当我从 mysql 将数据导出为 XML 时,它在 XML 文件中包含一个\b 字符。我写了代码来删除它,但后来意识到我不需要每次都做那个处理,所以我把它放在一个方法中并调用它我在 XML 文件中找到一个\b,只是现在正则表达式不匹配,甚至虽然我知道\b 在那里。

这是我正在做的:

主程序:

'''Program should start here'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\b")
if(p.match(line)):
print p.match(line)
processing = True
break #only one match needed

if(processing):
print "preprocess"
preprocess(xml_file)

预处理方法:

def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = []
for line in xml_file:
lines.append(re.sub("\b", "", line))

#go to the beginning of the file
xml_file.seek(0);
#overwrite with correct data
for line in lines:
xml_file.write(line);
xml_file.truncate()

任何帮助都会很棒,谢谢

最佳答案

\bregular expression engine 的标志:

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

因此您需要对其进行转义才能使用正则表达式找到它。

关于python - 正则表达式不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6344709/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com