gpt4 book ai didi

Separate lru for each argument value?(是否为每个参数值单独使用LRU?)

转载 作者:bug小助手 更新时间:2023-10-25 09:15:03 25 4
gpt4 key购买 nike



I need to write a sql database column getter, that takes in a column name and time, and returns the entire column of values for that column corresponding to the input time. This may be a frequent function call with the same arguments, so I would like to use an lru cache. However, I'm not sure if the frequency of the column names is uniformly distributed, so ideally, I would have a separate lru cache for each column name.

我需要编写一个SQL数据库列获取程序,它接受列名和时间,并返回对应于输入时间的该列的整列值列。这可能是使用相同参数的频繁函数调用,因此我希望使用LRU缓存。但是,我不确定列名的频率是否均匀分布,所以理想情况下,我应该为每个列名使用单独的lru缓存。


I previously had it like below, but I would like to separate the lru for each col_name.

我以前的代码如下所示,但我想为每个colname分开lru。


@lru_cache(...)
def get_col(self, col_name, time)
# do stuff to get the column and return it

How can I achieve this? Also, unfortunately, I have to support py2.

我怎样才能做到这一点呢?另外,不幸的是,我必须支持py2。


更多回答

Is the @lru_cache you are using in Python 2 a backport using some external library?

您在Python2中使用的@lru_cache是使用某个外部库的后端口吗?

@matszwecja I'm actually not familiar (I'm working in an existing codebase). I think it's customized and based on the one in functools. the one in functools if for py3+ iirc

@matszwecja我实际上并不熟悉(我在现有的代码库中工作)。我认为它是定制的,并基于FuncTools中的那个。用于py3+iirc的函数工具IF中的一个

So are you writing for Python2 or Python3? Writing something that will work for both is not a viable solution, and Python2 is pretty much a dead language for over 3 years now.

那么,你是在为Python2还是为Python3写作呢?编写对两者都有效的东西并不是一个可行的解决方案,而且到目前为止,3年多来,Python2基本上是一门死语言。

@matszwecja I have to write in py2 for this particular case. I don't have a say on what version of python I can use

@matszwecja对于这种特殊情况,我必须用py2编写。我没有发言权说我可以使用哪个版本的Python

Okay, at least it's not 2 different versions. I might try to come up with something that should be portable to Py2, but never wrote anything in 2, so any differences you'd have to adapt to on your own.

好吧,至少它不是两个不同的版本。我可能会试着想出一些应该可以移植到Py2上的东西,但从来没有在2中写过任何东西,所以任何不同之处你都必须自己适应。

优秀答案推荐

Since functools#lru_cache() was introduced only with Python 3.x, you will need to manage separate cache dictionaries for each column name, meaning:

由于函数工具#lru_cache()仅在Python3.x中引入,因此您需要为每个列名管理单独的缓存字典,这意味着:



  • Create an LRU cache class or use a third-party library that supports Python 2 to implement LRU cache functionality.

  • Inside your function, maintain a dictionary where each key is a column name and the value is an instance of an LRU cache.

  • Before fetching the column data in the get_col function, check if the column name is in the dictionary. If it is, use the associated LRU cache to get the data. If it is not, create a new LRU cache for that column name.


Something like this LRUCache class:

类似于LRUCache类:


from collections import OrderedDict

class LRUCache(object):
def __init__(self, maxsize=3): # Adjust maxsize as necessary
self.cache = OrderedDict()
self.maxsize = maxsize

def get(self, key):
sentinel = object()
value = self.cache.pop(key, sentinel)
if value is sentinel:
return None
else:
# If key was in dictionary, reinsert it at the end
self.cache[key] = value
return value

def put(self, key, value):
if len(self.cache) >= self.maxsize:
# Remove oldest entry
self.cache.popitem(last=False)
self.cache[key] = value

# Dictionary to hold separate LRU caches for each column name
column_caches = {}

def get_col(col_name, time):
# Get the LRU cache for the specified column name
# If it does not exist, create a new one
cache = column_caches.get(col_name)
if cache is None:
cache = LRUCache(maxsize=3) # Adjust maxsize as necessary
column_caches[col_name] = cache

# Try to get the value from the cache
result = cache.get(time)
if result is not None:
print(cache.cache) # Print the state of the cache
return result

# If the value is not in the cache, fetch it
result = "{} at {}".format(col_name, time) # Dynamic data generation based on input parameters

# Store the fetched value in the cache and return it
cache.put(time, result)
print(cache.cache) # Print the state of the cache
return result

# Testing
print(get_col('foo', 1))
print(get_col('foo', 2))
print(get_col('foo', 3))
print(get_col('foo', 4))
print(get_col('foo', 3))
print(get_col('foo', 1))

Output: (demo tio.run)

输出:(demo tio.run)


OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1


An LRU cache is simply a mapping that preserves insertion order so the most recently used item can be moved to the front and the least recently used item can be removed when the maximum size of the cache is reached. In both Python 2 and Python 3, such a data structure can be found in the standard library as collections.OrderedDict with the popitem(last=False) method.

LRU缓存只是一个保持插入顺序的映射,因此当达到缓存的最大大小时,最近使用的项可以移到前面,最近最少使用的项可以删除。在Python2和Python3中,都可以在标准库中找到这样的数据结构,它的形式是集合.OrderedDict和PopItem(last=False)方法。


To more easily initialize an LRU cache for each column name, you can also use collections.defaultdict:

要更轻松地为每个列名初始化LRU缓存,还可以使用集合。defaultdict:


from collections import defaultdict, OrderedDict

class Client:
def __init__(self, cache_size=3): # a cache size of only 3 for demo purposes
self.col_cache = defaultdict(OrderedDict)
self.cache_size = cache_size

def get_col(self, col_name, time):
cache = self.col_cache[col_name]
try:
cache[time] = cache.pop(time)
except KeyError:
if len(cache) >= self.cache_size:
cache.popitem(last=False)
cache[time] = '%s at %s' % (col_name, time) # make DB query here
print(cache)
return cache[time]

so that:

因此:


c = Client()
print(c.get_col('foo', 1))
print(c.get_col('foo', 2))
print(c.get_col('foo', 3))
print(c.get_col('foo', 4))
print(c.get_col('foo', 3))
print(c.get_col('foo', 1))

outputs:

产出:


OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1

Demo: Try it online!

演示:在线试用!


Note that since Python 3.2, collections.OrderedDict also has a move_to_end method for you to move an item to the front, so you can change the line:

请注意,从Python3.2开始,Collection tions.OrderedDict也有一个Move_to_End方法,用于将项目移到最前面,因此您可以更改行:


cache[time] = cache.pop(time)

to simply:

简单地说:


cache.move_to_end(time)


We may preserve the login of an existing implementation of lru_cache but organise a mapping of chosen argument values to separate caches within an outer decorator. Here is a sample implementation:

我们可以保留lru_cache的现有实现的登录,但将所选参数值的映射组织到外部修饰器中的单独缓存。以下是一个示例实现:


from functools import lru_cache, wraps
from inspect import getcallargs


def lru_agrument_cache(argument_name):
def decorator(function):
callable_buckets = {}

@wraps(function)
def wrapper(*args, **kwargs):
inspection = getcallargs(function, *args, **kwargs)

# should use functools._make_key() for more general use
bucket_key = inspection[argument_name]

if bucket_key in callable_buckets:
return callable_buckets[bucket_key](*args, **kwargs)

callable_buckets[bucket_key] = lru_cache(function)

return callable_buckets[bucket_key](*args, **kwargs)

# just to demonstrate usage
def cache_info(argument_value):
try:
return callable_buckets[argument_value].cache_info()
except KeyError:
return None

wrapper.cache_info = cache_info
return wrapper

return decorator

Usage:

使用方法:


@lru_agrument_cache('key')
def my_callable(key, times):
return key * times

Verify:

验证:


my_callable('A', 2)
Out[16]: 'AA'
my_callable('A', 2)
Out[17]: 'AA'
my_callable('A', 3)
Out[18]: 'AAA'
my_callable('B', 2)
Out[19]: 'BB'

my_callable.cache_info('A')
Out[20]: CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
my_callable.cache_info('B')
Out[21]: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)



  • inspect is also available in Python 2.*;

  • functools.lru_cache is to replace with that lru_cache currently is use;

  • functools.wraps and it's dependency functools.partial if not available, may than be taken from sources.

  • not sure what to add concerning thread safety though;


更多回答

Downvoted. Firstly, what you implemented is not an LRU cache to begin with, as it does not move the most recently used item to the front. Secondly, your dict-based implementation does not work in Python 2, where dict keys are unordered. Thirdly, the inclusion of col_name in the cache key is redundant since each col_name has its own cache. And lastly, testing if the return value of cache.get(cache_key) is None is prone to error since you can't tell if you're getting None because the given key is missing or because the value of the key is actually None.

被否决了。首先,您实现的不是LRU缓存,因为它不会将最近使用的项目移到最前面。其次,您的基于dict的实现在Python2中不起作用,因为在Python2中,dict键是无序的。第三,在缓存键中包含COL_NAME是多余的,因为每个COL_NAME都有自己的缓存。最后,测试cache.get(CACHEY_KEY)的返回值是否为NONE很容易出错,因为您不知道返回NONE是因为给定键丢失,还是因为键的值实际上为NONE。

@blhsing Good points, thank you for your feedback. I have revised the code accordingly.

@blhsing好点子,谢谢你的反馈。我已经相应地修改了代码。

It appears that the Python 3 LRU cache avoids rehashing a key. Even though in this case computing the hash does not appear to be particularly expensive, you are needlessly hashing the key twice if the key is in the cache first by your checking if the key exists in the dictionary and then again when you get the key's value. You can avoid this by doing value = self.cache.pop(key, sentinel) and then seeing if what is returned is the sentinel, i.e. some value that cannot possibly be in the cache.

看来,Python3LRU缓存避免了对密钥进行重新散列。尽管在这种情况下计算散列似乎不是特别昂贵,但如果键在缓存中,则不必要地对键进行两次散列,首先通过检查键是否存在于字典中,然后在获得键的值时再次进行。您可以通过执行value=self.cache.op(键,哨兵)来避免这种情况,然后查看返回的是否是哨兵,即不可能在缓存中的某个值。

@Booboo Thank you for the feedback, I have updated the code to use a sentinel value.

@Booboo感谢您的反馈,我已经更新了代码以使用哨兵值。

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com