gpt4 book ai didi

python - python中的质量字符串替换?

转载 作者:IT老高 更新时间:2023-10-28 21:32:58 27 4
gpt4 key购买 nike

假设我有一个如下所示的字符串:

str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"

您会注意到字符串中的很多位置都有一个 & 符号,后跟一个字符(例如“&y”和“&c”)。我需要用我在字典中的适当值替换这些字符,如下所示:

dict = {"&y":"\033[0;30m",
"&c":"\033[0;31m",
"&b":"\033[0;32m",
"&Y":"\033[0;33m",
"&u":"\033[0;34m"}

最快的方法是什么?我可以手动找到所有的 & 符号,然后遍历字典来更改它们,但这似乎很慢。做一堆正则表达式替换似乎也很慢(我的实际代码中会有大约 30-40 对的字典)。

任何建议都非常感谢,谢谢。

编辑:

正如在这个问题的评论中所指出的,我的字典是在运行时之前定义的,并且在应用程序生命周期的过程中永远不会改变。它是一个 ANSI 转义序列列表,其中包含大约 40 项。我要比较的平均字符串长度约为 500 个字符,但会有最多 5000 个字符的字符串(尽管这种情况很少见)。我目前也在使用 Python 2.6。

编辑#2我接受 Tor Valamos 的回答是正确的,因为它不仅提供了一个有效的解决方案(尽管它不是 最佳 解决方案),而且还考虑了所有其他问题并做了大量工作比较所有这些。这个答案是我在 StackOverflow 上遇到过的最好、最有帮助的答案之一。向你致敬。

最佳答案

mydict = {"&y":"\033[0;30m",
"&c":"\033[0;31m",
"&b":"\033[0;32m",
"&Y":"\033[0;33m",
"&u":"\033[0;34m"}
mystr = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"

for k, v in mydict.iteritems():
mystr = mystr.replace(k, v)

print mystr
The ←[0;30mquick ←[0;31mbrown ←[0;32mfox ←[0;33mjumps over the ←[0;34mlazy dog

我冒昧地比较了几个解决方案:

mydict = dict([('&' + chr(i), str(i)) for i in list(range(65, 91)) + list(range(97, 123))])

# random inserts between keys
from random import randint
rawstr = ''.join(mydict.keys())
mystr = ''
for i in range(0, len(rawstr), 2):
mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars

from time import time

# How many times to run each solution
rep = 10000

print 'Running %d times with string length %d and ' \
'random inserts of lengths 0-20' % (rep, len(mystr))

# My solution
t = time()
for x in range(rep):
for k, v in mydict.items():
mystr.replace(k, v)
#print(mystr)
print '%-30s' % 'Tor fixed & variable dict', time()-t

from re import sub, compile, escape

# Peter Hansen
t = time()
for x in range(rep):
sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict
print '%-30s' % 'Peter fixed & variable dict', time()-t

# Claudiu
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = compile("(%s)" % "|".join(map(escape, dict.keys())))

# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

t = time()
for x in range(rep):
multiple_replace(mydict, mystr)
print '%-30s' % 'Claudio variable dict', time()-t

# Claudiu - Precompiled
regex = compile("(%s)" % "|".join(map(escape, mydict.keys())))

t = time()
for x in range(rep):
regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr)
print '%-30s' % 'Claudio fixed dict', time()-t

# Andrew Y - variable dict
def mysubst(somestr, somedict):
subs = somestr.split("&")
return subs[0] + "".join(map(lambda arg: somedict["&" + arg[0:1]] + arg[1:], subs[1:]))

t = time()
for x in range(rep):
mysubst(mystr, mydict)
print '%-30s' % 'Andrew Y variable dict', time()-t

# Andrew Y - fixed
def repl(s):
return mydict["&"+s[0:1]] + s[1:]

t = time()
for x in range(rep):
subs = mystr.split("&")
res = subs[0] + "".join(map(repl, subs[1:]))
print '%-30s' % 'Andrew Y fixed dict', time()-t

Python 2.6 中的结果

Running 10000 times with string length 490 and random inserts of lengths 0-20
Tor fixed & variable dict 1.04699993134
Peter fixed & variable dict 0.218999862671
Claudio variable dict 2.48400020599
Claudio fixed dict 0.0940001010895
Andrew Y variable dict 0.0309998989105
Andrew Y fixed dict 0.0310001373291

claudiu 和 andrew 的解决方案都一直为 0,所以我不得不将其增加到 10 000 次运行。

我在 Python 3 中运行它(因为 unicode),替换了从 39 到 1024 的字符(38 是 & 符号,所以我不想包含它)。字符串长度可达 10.000,包括大约 980 次替换,长度为 0-20 的可变随机插入。从 39 到 1024 的 unicode 值会导致字符长度为 1 字节和 2 字节,这可能会影响某些解决方案。

mydict = dict([('&' + chr(i), str(i)) for i in range(39,1024)])

# random inserts between keys
from random import randint
rawstr = ''.join(mydict.keys())
mystr = ''
for i in range(0, len(rawstr), 2):
mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars

from time import time

# How many times to run each solution
rep = 10000

print('Running %d times with string length %d and ' \
'random inserts of lengths 0-20' % (rep, len(mystr)))

# Tor Valamo - too long
#t = time()
#for x in range(rep):
# for k, v in mydict.items():
# mystr.replace(k, v)
#print('%-30s' % 'Tor fixed & variable dict', time()-t)

from re import sub, compile, escape

# Peter Hansen
t = time()
for x in range(rep):
sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict
print('%-30s' % 'Peter fixed & variable dict', time()-t)

# Peter 2
def dictsub(m):
return mydict[m.group()]

t = time()
for x in range(rep):
sub(r'(&[a-zA-Z])', dictsub, mystr)
print('%-30s' % 'Peter fixed dict', time()-t)

# Claudiu - too long
#def multiple_replace(dict, text):
# # Create a regular expression from the dictionary keys
# regex = compile("(%s)" % "|".join(map(escape, dict.keys())))
#
# # For each match, look-up corresponding value in dictionary
# return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
#
#t = time()
#for x in range(rep):
# multiple_replace(mydict, mystr)
#print('%-30s' % 'Claudio variable dict', time()-t)

# Claudiu - Precompiled
regex = compile("(%s)" % "|".join(map(escape, mydict.keys())))

t = time()
for x in range(rep):
regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr)
print('%-30s' % 'Claudio fixed dict', time()-t)

# Separate setup for Andrew and gnibbler optimized dict
mydict = dict((k[1], v) for k, v in mydict.items())

# Andrew Y - variable dict
def mysubst(somestr, somedict):
subs = somestr.split("&")
return subs[0] + "".join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:]))

def mysubst2(somestr, somedict):
subs = somestr.split("&")
return subs[0].join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:]))

t = time()
for x in range(rep):
mysubst(mystr, mydict)
print('%-30s' % 'Andrew Y variable dict', time()-t)
t = time()
for x in range(rep):
mysubst2(mystr, mydict)
print('%-30s' % 'Andrew Y variable dict 2', time()-t)

# Andrew Y - fixed
def repl(s):
return mydict[s[0:1]] + s[1:]

t = time()
for x in range(rep):
subs = mystr.split("&")
res = subs[0] + "".join(map(repl, subs[1:]))
print('%-30s' % 'Andrew Y fixed dict', time()-t)

# gnibbler
t = time()
for x in range(rep):
myparts = mystr.split("&")
myparts[1:]=[mydict[x[0]]+x[1:] for x in myparts[1:]]
"".join(myparts)
print('%-30s' % 'gnibbler fixed & variable dict', time()-t)

结果:

Running 10000 times with string length 9491 and random inserts of lengths 0-20
Tor fixed & variable dict 0.0 # disqualified 329 secs
Peter fixed & variable dict 2.07799983025
Peter fixed dict 1.53100013733
Claudio variable dict 0.0 # disqualified, 37 secs
Claudio fixed dict 1.5
Andrew Y variable dict 0.578000068665
Andrew Y variable dict 2 0.56299996376
Andrew Y fixed dict 0.56200003624
gnibbler fixed & variable dict 0.530999898911

(** 请注意,gnibbler 的代码使用了不同的字典,其中的键不包含“&”。安德鲁的代码也使用了这个备用字典,但它并没有太大的区别,可能只是 0.01x加速。)

关于python - python中的质量字符串替换?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1919096/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com