gpt4 book ai didi

python - .join() 命令在 python 中具有最大字符串长度

转载 作者:太空宇宙 更新时间:2023-11-04 08:37:42 24 4
gpt4 key购买 nike

我想将一个 ID 列表连接到一个字符串,其中每个 ID 由“OR”分隔。在 python 中,我可以用

' OR '.join(list_of_ids)

我想知道是否有办法防止这个字符串变得太大(以字节为单位)。这对我来说很重要的原因是我在 API 中使用该字符串并且该 API 规定最大长度为 4094 字节。我的解决方案如下,我只是想知道是否有更好的解决方案?

list_of_query_strings = []
substring = list_of_ids[0]
list_of_ids.pop(0)
while list_of_ids:
new_addition = ' OR ' + list_of_ids[0]
if sys.getsizeof(substring + new_addition) < 4094:
substring += new_addition
else:
list_of_query_strings.append(substring)
substring = list_of_ids[0]
list_of_ids.pop(0)
list_of_query_strings.append(substring)

最佳答案

只是为了好玩,过度设计的解决方案(避免了 Schlemiel the Painter 重复的串联算法,允许您使用 str.join 进行有效组合):

from itertools import count, groupby

class CumulativeLengthGrouper:
def __init__(self, joiner, maxblocksize):
self.joinerlen = len(joiner)
self.maxblocksize = maxblocksize
self.groupctr = count()
self.curgrp = next(self.groupctr)
# Special cases initial case to cancel out treating first element
# as requiring joiner, without requiring per call special case
self.accumlen = -self.joinerlen

def __call__(self, newstr):
self.accumlen += self.joinerlen + len(newstr)
# If accumulated length exceeds block limit...
if self.accumlen > self.maxblocksize:
# Move to new group
self.curgrp = next(self.groupctr)
self.accumlen = len(newstr)
return self.curgrp

有了这个,你use itertools.groupby将您的可迭代对象分解为预定大小的组,然后在不使用重复连接的情况下加入它们:

 mystrings = [...]

myblocks = [' OR '.join(grp) for _, grp in
groupby(mystrings, key=CumulativeLengthGrouper(' OR ', 4094)]

如果目标是使用指定的编码生成具有给定字节大小的字符串,您可以调整 CumulativeLengthGrouper 以接受第三个构造函数参数:

class CumulativeLengthGrouper:
def __init__(self, joiner, maxblocksize, encoding='utf-8'):
self.encoding = encoding
self.joinerlen = len(joiner.encode(encoding))
self.maxblocksize = maxblocksize
self.groupctr = count()
self.curgrp = next(self.groupctr)
# Special cases initial case to cancel out treating first element
# as requiring joiner, without requiring per call special case
self.accumlen = -self.joinerlen

def __call__(self, newstr):
newbytes = newstr.encode(encoding)
self.accumlen += self.joinerlen + len(newbytes)
# If accumulated length exceeds block limit...
if self.accumlen > self.maxblocksize:
# Move to new group
self.curgrp = next(self.groupctr)
self.accumlen = len(newbytes)
return self.curgrp

关于python - .join() 命令在 python 中具有最大字符串长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47542374/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com