Separate lru for each argument value?(是否为每个参数值单独使用LRU？)-6ren

Separate lru for each argument value?(是否为每个参数值单独使用LRU？)

转载作者：bug小助手更新时间：2023-10-24 18:29:32

I need to write a sql database column getter, that takes in a column name and time, and returns the entire column of values for that column corresponding to the input time. This may be a frequent function call with the same arguments, so I would like to use an lru cache. However, I'm not sure if the frequency of the column names is uniformly distributed, so ideally, I would have a separate lru cache for each column name.

我需要编写一个SQL数据库列获取程序，它接受列名和时间，并返回对应于输入时间的该列的整列值列。这可能是使用相同参数的频繁函数调用，因此我希望使用LRU缓存。但是，我不确定列名的频率是否均匀分布，所以理想情况下，我应该为每个列名使用单独的lru缓存。

I previously had it like below, but I would like to separate the lru for each col_name.

我以前的代码如下所示，但我想为每个colname分开lru。

@lru_cache(...)
def get_col(self, col_name, time)
  # do stuff to get the column and return it

How can I achieve this? Also, unfortunately, I have to support py2.

我怎样才能做到这一点呢？另外，不幸的是，我必须支持py2。

更多回答

Is the @lru_cache you are using in Python 2 a backport using some external library?

您在Python2中使用的@lru_cache是使用某个外部库的后端口吗？

@matszwecja I'm actually not familiar (I'm working in an existing codebase). I think it's customized and based on the one in functools. the one in functools if for py3+ iirc

@matszwecja我实际上并不熟悉(我在现有的代码库中工作)。我认为它是定制的，并基于FuncTools中的那个。用于py3+iirc的函数工具IF中的一个

So are you writing for Python2 or Python3? Writing something that will work for both is not a viable solution, and Python2 is pretty much a dead language for over 3 years now.

那么，你是在为Python2还是为Python3写作呢？编写对两者都有效的东西并不是一个可行的解决方案，而且到目前为止，3年多来，Python2基本上是一门死语言。

@matszwecja I have to write in py2 for this particular case. I don't have a say on what version of python I can use

@matszwecja对于这种特殊情况，我必须用py2编写。我没有发言权说我可以使用哪个版本的Python

Okay, at least it's not 2 different versions. I might try to come up with something that should be portable to Py2, but never wrote anything in 2, so any differences you'd have to adapt to on your own.

好吧，至少它不是两个不同的版本。我可能会试着想出一些应该可以移植到Py2上的东西，但从来没有在2中写过任何东西，所以任何不同之处你都必须自己适应。

优秀答案推荐

Since functools#lru_cache() was introduced only with Python 3.x, you will need to manage separate cache dictionaries for each column name, meaning:

由于函数工具#lru_cache()仅在Python3.x中引入，因此您需要为每个列名管理单独的缓存字典，这意味着：

Create an LRU cache class or use a third-party library that supports Python 2 to implement LRU cache functionality.

Inside your function, maintain a dictionary where each key is a column name and the value is an instance of an LRU cache.

Before fetching the column data in the get_col function, check if the column name is in the dictionary. If it is, use the associated LRU cache to get the data. If it is not, create a new LRU cache for that column name.

Something like this LRUCache class:

类似于下面的LRUCache类：

from collections import OrderedDict

class LRUCache(object):
    def __init__(self, maxsize=3): # Adjust maxsize as necessary
        self.cache = OrderedDict()
        self.maxsize = maxsize

    def get(self, key):
        sentinel = object()
        value = self.cache.pop(key, sentinel)
        if value is sentinel:
            return None
        else:
            # If key was in dictionary, reinsert it at the end
            self.cache[key] = value
            return value

    def put(self, key, value):
        if len(self.cache) >= self.maxsize:
            # Remove oldest entry
            self.cache.popitem(last=False)
        self.cache[key] = value

# Dictionary to hold separate LRU caches for each column name
column_caches = {}

def get_col(col_name, time):
    # Get the LRU cache for the specified column name
    # If it does not exist, create a new one
    cache = column_caches.get(col_name)
    if cache is None:
        cache = LRUCache(maxsize=3) # Adjust maxsize as necessary
        column_caches[col_name] = cache

    # Try to get the value from the cache
    result = cache.get(time)
    if result is not None:
        print(cache.cache) # Print the state of the cache
        return result

    # If the value is not in the cache, fetch it
    result = "{} at {}".format(col_name, time) # Dynamic data generation based on input parameters

    # Store the fetched value in the cache and return it
    cache.put(time, result)
    print(cache.cache) # Print the state of the cache
    return result

# Testing
print(get_col('foo', 1))
print(get_col('foo', 2))
print(get_col('foo', 3))
print(get_col('foo', 4))
print(get_col('foo', 3))
print(get_col('foo', 1))

Output: (demo tio.run)

输出：(demo tio.run)

OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1

An LRU cache is simply a mapping that preserves insertion order so the most recently used item can be moved to the front and the least recently used item can be removed when the maximum size of the cache is reached. In both Python 2 and Python 3, such a data structure can be found in the standard library as collections.OrderedDict with the popitem(last=False) method.

LRU缓存只是一个保持插入顺序的映射，因此当达到缓存的最大大小时，最近使用的项可以移到前面，最近最少使用的项可以删除。在Python2和Python3中，都可以在标准库中找到这样的数据结构，它的形式是集合.OrderedDict和PopItem(last=False)方法。

To more easily initialize an LRU cache for each column name, you can also use collections.defaultdict:

要更轻松地为每个列名初始化LRU缓存，还可以使用集合。defaultdict：

from collections import defaultdict, OrderedDict

class Client:
    def __init__(self, cache_size=3): # a cache size of only 3 for demo purposes
        self.col_cache = defaultdict(OrderedDict)
        self.cache_size = cache_size

    def get_col(self, col_name, time):
        cache = self.col_cache[col_name]
        try:
            cache[time] = cache.pop(time)
        except KeyError:
            if len(cache) >= self.cache_size:
                cache.popitem(last=False)
            cache[time] = '%s at %s' % (col_name, time) # make DB query here
        print(cache)
        return cache[time]

so that:

因此：

c = Client()
print(c.get_col('foo', 1))
print(c.get_col('foo', 2))
print(c.get_col('foo', 3))
print(c.get_col('foo', 4))
print(c.get_col('foo', 3))
print(c.get_col('foo', 1))

outputs:

产出：

OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1

Demo: Try it online!

演示：在线试用！

Note that since Python 3.2, collections.OrderedDict also has a move_to_end method for you to move an item to the front, so you can change the line:

请注意，从Python3.2开始，Collection tions.OrderedDict也有一个Move_to_End方法，用于将项目移到最前面，因此您可以更改行：

cache[time] = cache.pop(time)

to simply:

简单地说：

cache.move_to_end(time)

We may preserve the login of an existing implementation of lru_cache but organise a mapping of chosen argument values to separate caches within an outer decorator. Here is a sample implementation:

我们可以保留lru_cache的现有实现的登录，但将所选参数值的映射组织到外部修饰器中的单独缓存。以下是一个示例实现：

from functools import lru_cache, wraps
from inspect import getcallargs


def lru_agrument_cache(argument_name):
    def decorator(function):
        callable_buckets = {}

        @wraps(function)
        def wrapper(*args, **kwargs):
            inspection = getcallargs(function, *args, **kwargs)
            
            # should use functools._make_key() for more general use
            bucket_key = inspection[argument_name]

            if bucket_key in callable_buckets: 
                return callable_buckets[bucket_key](*args, **kwargs)
            
            callable_buckets[bucket_key] = lru_cache(function)

            return callable_buckets[bucket_key](*args, **kwargs)
        
        # just to demonstrate usage
        def cache_info(argument_value):
            try:
                return callable_buckets[argument_value].cache_info()
            except KeyError:
                return None

        wrapper.cache_info = cache_info
        return wrapper

    return decorator

Usage:

用途：

@lru_agrument_cache('key')
def my_callable(key, times):
    return key * times

Verify:

验证：

my_callable('A', 2)
Out[16]: 'AA'
my_callable('A', 2)
Out[17]: 'AA'
my_callable('A', 3)
Out[18]: 'AAA'
my_callable('B', 2)
Out[19]: 'BB'

my_callable.cache_info('A')
Out[20]: CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
my_callable.cache_info('B')
Out[21]: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)

inspect is also available in Python 2.*;

functools.lru_cache is to replace with that lru_cache currently is use;

functools.wraps and it's dependency functools.partial if not available, may than be taken from sources.

not sure what to add concerning thread safety though;

更多回答

Downvoted. Firstly, what you implemented is not an LRU cache to begin with, as it does not move the most recently used item to the front. Secondly, your dict-based implementation does not work in Python 2, where dict keys are unordered. Thirdly, the inclusion of col_name in the cache key is redundant since each col_name has its own cache. And lastly, testing if the return value of cache.get(cache_key) is None is prone to error since you can't tell if you're getting None because the given key is missing or because the value of the key is actually None.

被否决了。首先，您实现的不是LRU缓存，因为它不会将最近使用的项目移到最前面。其次，您的基于dict的实现在Python2中不起作用，因为在Python2中，dict键是无序的。第三，在缓存键中包含COL_NAME是多余的，因为每个COL_NAME都有自己的缓存。最后，测试cache.get(CACHEY_KEY)的返回值是否为NONE很容易出错，因为您不知道返回NONE是因为给定键丢失，还是因为键的值实际上为NONE。

@blhsing Good points, thank you for your feedback. I have revised the code accordingly.

@blhsing好点子，谢谢你的反馈。我已经相应地修改了代码。

It appears that the Python 3 LRU cache avoids rehashing a key. Even though in this case computing the hash does not appear to be particularly expensive, you are needlessly hashing the key twice if the key is in the cache first by your checking if the key exists in the dictionary and then again when you get the key's value. You can avoid this by doing value = self.cache.pop(key, sentinel) and then seeing if what is returned is the sentinel, i.e. some value that cannot possibly be in the cache.

看来，Python3LRU缓存避免了对密钥进行重新散列。尽管在这种情况下计算散列似乎不是特别昂贵，但如果键在缓存中，则不必要地对键进行两次散列，首先通过检查键是否存在于字典中，然后在获得键的值时再次进行。您可以通过执行value=self.cache.op(键，哨兵)来避免这种情况，然后查看返回的是否是哨兵，即不可能在缓存中的某个值。

@Booboo Thank you for the feedback, I have updated the code to use a sentinel value.

@Booboo感谢您的反馈，我已经更新了代码以使用哨兵值。

sql - 参数值
SELECT ID, AppID, Description, Min([Transaction Date]) AS TransactionDate FROM AppProsHist WHERE [De
java - SonarQube 参数值
目前我正在创建规则，该规则应该检查方法是否包含 @Test 和 @TestInfo 注释。如果确实如此，@TestInfo 不应有空参数 testCaseId。有几种不同的可能方法来填充 testC
重新运行每个条目的 MySQL 参数值
是否可以设置参数值，使其在 where 子句中始终结果为 true？作为示例，考虑一个查询: SELECT name FROM student WHERE class=@parameter; 现在我
JPA 查询以处理 NULL 参数值
我是 JPA 的新手，这是我的查询之一，我有几个参数作为查询的一部分，任何参数都可以为空值 @Query(value = "SELECT ord.purchaseOrderNumber,ord.sal
opengl - 着色器 TextureLod 参数值？
LOD 参数对 texturelod 取什么值？ ?我发现的规范根本没有提到它。它是百分比还是带有百分比的索引值。如果是后者，有没有办法获得纹理具有的 mipmap 数量，以便我能够使用百分比？最佳
javascript - 用新的值覆盖现有的 URL 参数值
我希望此代码替换现有的 URL 参数“aspid”，但它的作用是在现有的 id 上添加另一个 id。有人可以帮忙吗？ $(document).ready(function() { function
java - 参数值 [1] 与预期类型不匹配
在 Spring-boot 项目中，我尝试将 Date 对象作为请求参数传递，并收到此错误: Parameter value [1] did not match expected type [java
groovy - 如何在循环中以字符串形式访问 Jenkinsfile 参数值
在我们的 Jenkinsfile 中，我们有很多参数(参数化构建)，在这种情况下，我想检查每个参数是否已切换并对其进行操作。这些参数具有相似的名称，但以不同的小数结尾，因此我想迭代它们以实现此目的。
java - 是否可以在指令中检索 Freemarker 参数值？
我的模板之一中有类似于以下内容的内容: 但是 Freemarker 不高兴并给了我: Exception in thread "main" freemarker.core.ParseExceptio
java - 从过滤器获取 servlet 参数值
我正在从表单向重定向 servlet 发送一个 post 请求。然后，Servlet 将表单写入其响应 (getWriter) 对象。该表单包含许多隐藏字段。我使用 javascript 提交此表单(
JavaScript 参数值 'becomes' 传递给构造函数时未定义
我正在创建一个 JavaScript 组件，我正在根据 jQuery 结果创建该组件的实例，但是，我传递到构造函数中的 DOM 元素虽然在我单步执行调用代码中的循环时已填充，但在传递给构造函数时是未定
html - 获取以某种格式编码的 url 参数值
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
javascript - 参数值 `onClick` 已更改
我对 javascript 有疑问。假设我有这样的 javascript 函数: function show_popup(id) { alert(id); } 编
javascript - 抓取一个json请求 `ak`参数值
我目前正在尝试抓取嵌入式 m3u8 url 路径以进行自学。到目前为止，我设法确定请求会生成带有 m3u8 信息的 json 响应。例如，https://headlines.yahoo.co.jp
ios - 用浮点变量替换谷歌地图 API 参数值
谷歌地图 API 需要这样的参数: NSString *urlString=@"http://maps.google.com/maps?saddr=43.2923,5.45427&daddr=43.4
javascript - JavaScript 中的“参数值”
“parameterValue”是在 Javascript 中的事件上传递的默认参数吗？谁能解释一下这个值从何而来。 Load Ajax content 我发现它在以下文章中使用 - http://w
css - 在鼠标悬停时更改 bgcolor 参数值？
我有一个 .SWF 电子邮件提交表单。背景颜色通过以下方式设置: `` 并嵌入: `` 是否可以将鼠标悬停在对象或包含的 div 上来更改这些值？即#ffffff 非常感谢! 最佳答案将 wmode
python - 如何拟合数据集以便它们共享一些(但不是全部)参数值
假设我想用指数函数拟合两个数组x_data_one和y_data_one。为此，我可以使用以下代码(其中 x_data_one 和 y_data_one 被赋予虚拟定义): import numpy
java - 如何从外部属性文件填充 Liquibase 参数值？
有什么方法可以填充 parameters在基于外部属性文件内容的 Liquibase 变更日志文件中？例如，我希望能够说: 并将 table.name 的值和
angularjs - 发送保留字符作为 URL 参数值
我必须按原样发送参数值 'AbCd/EfgH'。但是 Angular 将 '/' 转义为 %2F。我无法控制 URL。解决这个问题的最佳方法是什么？我不想强制 Angular 停止对所有其他 UR

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Separate lru for each argument value?(是否为每个参数值单独使用LRU？)