gpt4 book ai didi

python - 使用正则表达式替换文本文件中的多个条目

转载 作者:太空宇宙 更新时间:2023-11-03 15:05:58 26 4
gpt4 key购买 nike

我有一个包含许多多行记录的结构化文本文件。每条记录都应该有一个关键的唯一字段。我需要阅读一系列这些文件,找到非唯一的键字段并用唯一值替换键值。

我的脚本正在识别所有需要替换的字段。我将这些字段存储在字典中,其中键是非唯一字段,值是唯一值列表。

例如:

 {
"1111111111" : ["1234566363", "5533356775", "6443458343"]
}

我想要做的是仅读取每个文件一次,查找“1111111111”(字典键)的实例,并将第一个匹配项替换为第一个键值,第二个匹配项替换为第二个键值等。

我正在尝试使用正则表达式,但我不确定如何在不多次循环文件的情况下构造合适的 RE

这是我当前的代码:

def multireplace(Text, Vars):
dictSorted = sorted(Vars, key=len, reverse=True)
regEx = re.compile('|'.join(map(re.escape, dictSorted)))
return regEx.sub(lambda match: Vars[match.group(0)], Text)

text = multireplace(text, find_replace_dict)

它适用于单个键:值组合,但如果:值是列表,则将无法编译:

return regEx.sub(lambda match: Vars[match.group(0)], Text , 1)
TypeError: sequence item 1: expected str instance, list found

是否可以在不多次循环文件的情况下更改函数?

最佳答案

看一下并阅读评论。如果有任何不明白的地方请告诉我:

import re

def replace(text, replacements):
# Make a copy so we don't destroy the original.
replacements = replacements.copy()

# This is essentially what you had already.
regex = re.compile("|".join(map(re.escape, replacements.keys())))

# In our lambda, we pop the first element from the array. This way,
# each time we're called with the same group, we'll get the next replacement.
return regex.sub(lambda m: replacements[m.group(0)].pop(0), text)

print(replace("A A B B A B", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))

# Output:
# A1 A2 B1 B2 A3 B3

更新

为了帮助解决下面评论中的问题,请尝试这个版本,它将准确地告诉您哪个字符串用完了替换:

import re

def replace(text, replacements):

# Let's make a method so we can do a little more than the lambda.
def make_replacement(match):
try:
return replacements[match.group(0)].pop(0)
except IndexError:
# Print out debug info about what happened
print("Ran out of replacements for {}".format(match.group(0)))
# Re-raise so the process still exits.
raise

# Make a copy so we don't destroy the original.
replacements = replacements.copy()

# This is essentially what you had already.
regex = re.compile("|".join(map(re.escape, replacements.keys())))

# In our lambda, we pop the first element from the array. This way,
# each time we're called with the same group, we'll get the next replacement.
return regex.sub(make_replacement, text)

print(replace("A A B B A B A", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))

# Output:
# A1 A2 B1 B2 A3 B3

关于python - 使用正则表达式替换文本文件中的多个条目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44663718/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com