gpt4 book ai didi

python-2.7 - 使用 cython 加速数以千计的集合操作

转载 作者:行者123 更新时间:2023-12-03 08:02:15 26 4
gpt4 key购买 nike

我一直在努力克服对 Cython 的恐惧(恐惧是因为我对 c 或 c++ 一无所知)

我有一个函数,它接受 2 个参数、一个集合(我们称之为 testSet )和一个集合列表(我们称之为 targetSets )。然后该函数遍历 targetSets ,并计算与 testSet 的交点的长度,将该值添加到列表中,然后返回该列表。

现在,这本身并没有那么慢,但问题是我需要对 testSet 进行模拟(以及大量的,〜 10,000),并且 targetSet 的长度约为 10,000 个。

因此,对于要测试的少量模拟,纯 python 实现需要大约 50 秒。

我尝试制作一个 cython 函数,它工作正常,现在运行时间约为 16 秒。

如果我可以对任何人都能想到的 cython 函数做任何其他事情,那就太好了(python 2.7 btw)

这是我在 中的 Cython 实现重叠函数.pyx

def computeOverlap(set testSet, list targetSets):
cdef list obsOverlaps = []
cdef int i, N
cdef set overlap
N = len(targetSets)
for i in range(N):
overlap = testSet & targetSets[i]
if len(overlap) <= 1:
obsOverlaps.append(0)
else:
obsOverlaps.append(len(overlap))
return obsOverlaps

setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("overlapFunc",
["overlapFunc.pyx"])]

setup(
name = 'computeOverlap function',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)

以及一些代码来构建一些用于测试和计时功能的随机集。 测试.py
import numpy as np
from overlapFunc import computeOverlap
import time

def simRandomSet(n):
for i in range(n):
simSet= set(np.random.randint(low=1, high=100, size=50))
yield simSet


if __name__ == '__main__':
np.random.seed(23032014)
targetSet = [set(np.random.randint(low=1, high=100, size=50)) for i in range(10000)]

simulatedTestSets = simRandomSet(200)
start = time.time()
for i in simulatedTestSets:
obsOverlaps = computeOverlap(i, targetSet)
print time.time()-start

我尝试在 computerOverlap 函数开始时更改 def,如下所示:
cdef list computeOverlap(set testSet, list targetSets):

但是当我运行 setup.py 时收到以下警告消息脚本:
'__pyx_f_11overlapFunc_computeOverlap' defined but not used [-Wunused-function]

然后当我运行一些尝试使用该功能的东西时,我得到一个导入错误:
    from overlapFunc import computeOverlap
ImportError: cannot import name computeOverlap

在此先感谢您的帮助,

干杯,

戴维

最佳答案

在以下行中,扩展模块名称和文件名与实际文件名不匹配。

ext_modules = [Extension("computeOverlapWithGeneList", 
["computeOverlapWithGeneList.pyx"])]

将其替换为:
ext_modules = [Extension("overlapFunc",
["overlapFunc.pyx"])]

关于python-2.7 - 使用 cython 加速数以千计的集合操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22586543/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com