gpt4 book ai didi

python - 正则表达式:替换文本,除非它位于引号之间

转载 作者:行者123 更新时间:2023-12-01 09:34:37 25 4
gpt4 key购买 nike

我正在开发一个转译器,并希望用 Python 的标记替换我的语言的标记。替换是这样完成的:

for rep in reps:
pattern, translated = rep;

# Replaces every [pattern] with [translated] in [transpiled]
transpiled = re.sub(pattern, translated, transpiled, flags=re.UNICODE)

其中 reps(要替换的正则表达式,要替换的字符串) 有序对的列表,transpiled 是要转换的文本被转译。

但是,我似乎找不到一种方法来从替换过程中排除引号之间的文本。请注意,这是针对一种语言的,因此它也应该适用于转义引号和单引号。

最佳答案

这可能取决于您定义模式的方式,但一般来说,您始终可以使用前向组和后向组包围您的模式,以确保引号之间的文本不匹配:

import re

transpiled = "A foo with \"foo\" and single quoted 'foo'. It even has an escaped \\'foo\\'!"

reps = [("foo", "bar"), ("and", "or")]

print(transpiled) # before the changes

for rep in reps:
pattern, translated = rep
transpiled = re.sub("(?<=[^\"']){}(?=\\\\?[^\"'])".format(pattern),
translated, transpiled, flags=re.UNICODE)
print(transpiled) # after each change

这将产生:

A foo with "foo" and single quoted 'foo'. It even has an escaped \'foo\'!A bar with "foo" and single quoted 'foo'. It even has an escaped \'foo\'!A bar with "foo" or single quoted 'foo'. It even has an escaped \'foo\'!

UPDATE: If you want to ignore whole quoted swaths of text, not just a quoted word, you'll have to do a bit more work. While you could do it by looking for repeated quotations the whole lookahead/lookbehind mechanism would get really messy and probably far from optimal - it's just easier to separate the quoted from non-quoted text and do replacements only in the former, something like:

import re

QUOTED_STRING = re.compile("(\\\\?[\"']).*?\\1") # a pattern to match strings between quotes

def replace_multiple(source, replacements, flags=0): # a convenience replacement function
if not source: # no need to process empty strings
return ""
for r in replacements:
source = re.sub(r[0], r[1], source, flags=flags)
return source

def replace_non_quoted(source, replacements, flags=0):
result = [] # a store for the result pieces
head = 0 # a search head reference
for match in QUOTED_STRING.finditer(source):
# process everything until the current quoted match and add it to the result
result.append(replace_multiple(source[head:match.start()], replacements, flags))
result.append(match[0]) # add the quoted match verbatim to the result
head = match.end() # move the search head to the end of the quoted match
if head < len(source): # if the search head is not at the end of the string
# process the rest of the string and add it to the result
result.append(replace_multiple(source[head:], replacements, flags))
return "".join(result) # join back the result pieces and return them

您可以将其测试为:

print(replace_non_quoted("A foo with \"foo\" and 'foo', says: 'I have a foo'!", reps))
# A bar with "foo" or 'foo', says: 'I have a foo'!
print(replace_non_quoted("A foo with \"foo\" and foo, says: \\'I have a foo\\'!", reps))
# A bar with "foo" or bar, says: \'I have a foo\'!
print(replace_non_quoted("A foo with '\"foo\" and foo', says - I have a foo!", reps))
# A bar with '"foo" and foo', says - I have a bar!

作为额外的好处,这还允许您定义完全限定的正则表达式模式作为替换:

print(replace_non_quoted("My foo and \"bar\" are like 'moo' and star!",
(("(\w+)oo", "oo\\1"), ("(\w+)ar", "ra\\1"))))
# My oof and "bar" are like 'moo' and rast!

但是,如果您的替换不涉及模式并且只需要简单的替换,您可以将 replace_multiple() 辅助函数中的 re.sub() 替换为显着的更快的原生 str.replace()

最后,如果不需要复杂的模式,您可以完全摆脱正则表达式:

QUOTE_STRINGS = ("'", "\\'", '"', '\\"')  # a list of substring considered a 'quote'

def replace_multiple(source, replacements): # a convenience multi-replacement function
if not source: # no need to process empty strings
return ""
for r in replacements:
source = source.replace(r[0], r[1])
return source

def replace_non_quoted(source, replacements):
result = [] # a store for the result pieces
head = 0 # a search head reference
eos = len(source) # a convenience string length reference
quote = None # last quote match literal
quote_len = 0 # a convenience reference to the current quote substring length
while True:
if quote: # we already have a matching quote stored
index = source.find(quote, head + quote_len) # find the closing quote
if index == -1: # EOS reached
break
result.append(source[head:index + quote_len]) # add the quoted string verbatim
head = index + quote_len # move the search head after the quoted match
quote = None # blank out the quote literal
else: # the current position is not in a quoted substring
index = eos
# find the first quoted substring from the current head position
for entry in QUOTE_STRINGS: # loop through all quote substrings
candidate = source.find(entry, head)
if head < candidate < index:
index = candidate
quote = entry
quote_len = len(entry)
if not quote: # EOS reached, no quote found
break
result.append(replace_multiple(source[head:index], replacements))
head = index # move the search head to the start of the quoted match
if head < eos: # if the search head is not at the end of the string
result.append(replace_multiple(source[head:], replacements))
return "".join(result) # join back the result pieces and return them

关于python - 正则表达式:替换文本,除非它位于引号之间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49641089/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com