gpt4 book ai didi

python - 函数结果的 Cython FIFO 缓存

转载 作者:行者123 更新时间:2023-12-04 15:26:26 27 4
gpt4 key购买 nike

我需要某种缓存来存储 Cython 中函数 f 的结果以供将来重用。一个简单的 FIFO 缓存策略可以在缓存已满时丢弃最近最少计算的结果。每次从 Python 调用另一个使用缓存并调用 f 的函数时,我都需要重新初始化缓存。我使用包裹在扩展类型中的 std::map 想出了以下解决方案:

# distutils: language = c++

import sys
import time

from libcpp.map cimport map as cppmap
from libcpp.utility cimport pair as cpppair
from libcpp.queue cimport queue as cppqueue
from cython.operator cimport dereference as deref

ctypedef cpppair[long, long] mapitem_t
ctypedef cppmap[long, long].iterator mi_t


cdef class Cache_map:
"""Cache container"""
cdef:
cppmap[long, long] _cache_data
cppqueue[long] _order
long _cachesize
long _size

def __init__(self, long cachesize=100):
self._cachesize = cachesize
self._size = 0

cdef mi_t setitem(
self, mi_t it, long key, long value):
"""Insert key/value pair into cache and return position"""

if self._size >= self._cachesize:
self._cache_data.erase(self._order.front())
self._order.pop()
else:
self._size += 1
self._order.push(key)
return self._cache_data.insert(it, mapitem_t(key, value))

@property
def cache_data(self):
return self._cache_data


cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2


cdef long cached_f(long x, Cache_map Cache):
cdef mi_t search = Cache._cache_data.lower_bound(x)

if search != Cache._cache_data.end() and x == deref(search).first:
return deref(search).second
return deref(Cache.setitem(search, x, f(x))).second


def use_cache():
# Output container
cdef list cache_size = []
cdef list timings = []
cdef list results = []

cdef long i, r
cdef Cache_map Cache = Cache_map(10) # Initialise cache

cache_size.append(sys.getsizeof(Cache))
go = time.time()
for i in range(100):
# Silly loop using the cache
for r in range(2):
results.append(cached_f(i, Cache))
timings.append(time.time() - go)
go = time.time()
cache_size.append(sys.getsizeof(Cache))
go = time.time()

return cache_size, timings, results

虽然这在原则上可行,但它有一些缺点:

  • 我必须手动创建 cached_f 来包装 f(不太可重用)
  • 我必须将 Cache 传递给 cached_f(不必要的昂贵???)
  • Cached_map 被显式写入以缓存来自 f 的结果(不是很可重用)

我想这是一个相当标准的任务,那么有更好的方法吗?

例如,我尝试将指向缓存的指针传递给 cached_f 但似乎我无法创建指向扩展类型对象的指针?以下内容:

cdef Cache_map Cache = Cache_map(10)
cdef Cache_map *Cache_ptr

Cache_ptr = &Cache

抛出 cache_map.pyx:66:16:无法获取 Python 变量“Cache”的地址

最佳答案

我认为从软件工程的角度来看,将函数(在 C/cdef-Cython 中是函数指针/仿函数)及其内存捆绑在一个对象/类中是个好主意。

我的方法是编写一个 cdef 类(我们称之为 FunWithMemoization),它有一个函数指针和一个用于存储已知结果的内存数据结构。

因为生命太短暂,无法用 Cython 编写 c++ 代码,所以我用纯 c++ 编写了 memoization-class(完整代码可以在下面进一步找到),这或多或少与您的方法非常相似(但是使用unordered_map) 并用 Cython 包装/使用它:

%%cython -+
from libcpp cimport bool
cdef extern from *:
"""
// see full code bellow
"""
struct memoization_result:
long value;
bool found;

cppclass memoization:
memoization()
void set_value(long, long)
memoization_result find_value(long key)

ctypedef long(*f_type)(long)
cdef long id_fun(long x):
return x


cdef class FunWithMemoization:
cdef memoization mem
cdef f_type fun
def __cinit__(self):
self.fun = id_fun

cpdef long evaluate(self, long x):
cdef memoization_result look_up = self.mem.find_value(x)
if look_up.found:
return look_up.value
cdef long val = self.fun(x)
self.mem.set_value(x, val)
return val

我已经使用 id_fun 来默认初始化 fun 成员,但是我们需要进一步的功能来使 FunWithMemoization 有用,例如:

import time
cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2

def create_f_with_memoization():
fun = FunWithMemoization()
fun.fun = f
return fun

显然还有其他方法可以创建有用的FunWithMemoization,可以使用ctypes 来获取函数地址或这个receipt .

现在:

f = create_f_with_memoization()
# first time really calculated:
%timeit -r 1 -n 1 f.evaluate(2)
#10.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# second time - from memoization:
%timeit -r 1 -n 1 f.evaluate(2)
1.4 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

完整代码:

%%cython -+
from libcpp cimport bool
cdef extern from *:
"""
#include<unordered_map>
#include <queue>

struct memoization_result{
long value;
bool found;
};

class memoization{
private:
std::unordered_map<long, long> map;
std::queue<long> key_order;
size_t max_size;
public:
memoization(): max_size(128){}
void set_value(long key, long val){
//assumes key isn't yet in map
map[key]=val;
key_order.push(key);
if(key_order.size()>max_size){
key_order.pop();
}
}
memoization_result find_value(long key) const{
auto it = map.find(key);
if(it==map.cend()){
return {0, false};
}
else{
return {it->second, true};
}
}
};
"""
struct memoization_result:
long value;
bool found;

cppclass memoization:
memoization()
void set_value(long, long)
memoization_result find_value(long key)

ctypedef long(*f_type)(long)
cdef long id_fun(long x):
return x


cdef class FunWithMemoization:
cdef memoization mem
cdef f_type fun
def __cinit__(self):
self.fun = id_fun

cpdef long evaluate(self, long x):
cdef memoization_result look_up = self.mem.find_value(x)
if look_up.found:
return look_up.value
cdef long val = self.fun(x)
self.mem.set_value(x, val)
return val


import time
cdef long f(long x):
"""Expensive function"""
time.sleep(0.01)
return x**2

def create_f_with_memoization():
fun = FunWithMemoization()
fun.fun = f
return fun

关于python - 函数结果的 Cython FIFO 缓存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62159140/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com