gpt4 book ai didi

python - 使用 cython 加速 python 代码

转载 作者:太空狗 更新时间:2023-10-30 00:41:22 25 4
gpt4 key购买 nike

我有一个函数,它基本上只是对一个简单定义的哈希函数进行大量调用,并进行测试以查看何时找到重复项。我需要用它做很多模拟,所以希望它尽可能快。我正在尝试使用 cython 来执行此操作。 cython 代码当前使用正常的 python 整数列表调用,其值在 0 到 m^2 范围内。

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls
def h3(int a,int b,int c,int d, int m,int x):
return (a*x**2 + b*x+c) %m
def floyd(inputx):
dupefound, nohashcalls = (0,0)
m = len(inputx)
loops = int(m*math.log(m))
for loopno in xrange(loops):
if (dupefound == 1):
break
a = random.randrange(m)
b = random.randrange(m)
c = random.randrange(m)
d = random.randrange(m)
pos = random.randrange(m)
value = inputx[pos]
listofpos = [0] * m
listofpos[pos] = 1
setofvalues = set([value])
cyclelimit = int(math.sqrt(m))
for j in xrange(cyclelimit):
pos = h3(a,b, c,d, m, inputx[pos])
nohashcalls += 1
if (inputx[pos] in setofvalues):
if (listofpos[pos]==1):
dupefound = 0
else:
dupefound = 1
print "Duplicate found at position", pos, " and value", inputx[pos]
break
listofpos[pos] = 1
setofvalues.add(inputx[pos])
return dupefound, nohashcalls

如何转换 inputx 和 listofpos 以使用 C 类型数组并以 C 速度访问数组?我可以使用任何其他加速吗? setofvalues 可以加速吗?

所以有一些东西可以比较,50 次调用 floyd() 和 m = 5000 目前在我的电脑上大约需要 30 秒。

更新:显示如何调用 floyd 的示例代码片段。

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

最佳答案

首先,您似乎必须在函数 中键入变量。 A good example of it is here.

其次,cython -a,用于“注释”,为您提供了 cython 编译器生成的代码的非常出色的分解,并以颜色编码指示有多脏(阅读:python api重)它是。在尝试优化任何内容时,此输出非常重要。

第三,现在著名的页面working with Numpy解释了如何以 C 风格快速访问 Numpy 数组数据。不幸的是,它冗长而烦人。然而,我们很幸运,因为最近的 Cython 提供了 Typed Memory Views ,既易于使用又很棒。在尝试执行任何其他操作之前,请阅读整页内容。

十分钟左右后,我想到了这个:

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
# Type the variables in the scope of the function.
cdef int a,b,c,d, value, cyclelimit
cdef unsigned int dupefound = 0
cdef unsigned int nohashcalls = 0
cdef unsigned int loopno, pos, j

# `m` has type int because inputx is already a Cython memory view and
# `infer-types` is on.
m = inputx.shape[0]

cdef unsigned int loops = int(m*math.log(m))

# Again using the memory view, but letting Numpy allocate an array of zeros.
cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

# Keep this random sampling out of the loop
cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

for loopno in range(loops):
if (dupefound == 1):
break

# From our precomputed array
a = randoms[loopno, 0]
b = randoms[loopno, 1]
c = randoms[loopno, 2]
d = randoms[loopno, 3]
pos = randoms[loopno, 4]

value = inputx[pos]

# Unforunately, Memory View does not support "vectorized" operations
# like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
for j in range(m):
listofpos[j] = 0

listofpos[pos] = 1
setofvalues = set((value,))
cyclelimit = int(math.sqrt(m))
for j in range(cyclelimit):
pos = h3(a, b, c, d, m, inputx[pos])
nohashcalls += 1
if (inputx[pos] in setofvalues):
if (listofpos[pos]==1):
dupefound = 0
else:
dupefound = 1
print "Duplicate found at position", pos, " and value", inputx[pos]
break
listofpos[pos] = 1
setofvalues.add(inputx[pos])
return dupefound, nohashcalls

docs.cython.org 上没有解释的技巧,这是我自己学习它们的地方,但有助于将它们结合在一起。

对您的原始代码最重要的更改在注释中,但它们都相当于为 Cython 提供有关如何生成不使用 Python API 的代码的提示。

顺便说一句:我真的不知道为什么 infer_types 默认没有打开。它让编译器尽可能隐式使用 C 类型而不是 Python 类型,这意味着您的工作量更少。

如果您对此运行 cython -a,您将看到调用 Python 的唯一行是对 random.sample 的调用,以及构建或添加到 Python set()。

在我的机器上,您的原始代码运行时间为 2.1 秒。我的版本运行时间为 0.6 秒。

下一步是让 random.sample 脱离循环,但我会把它留给你。

我编辑了我的答案以演示如何预先计算兰特样本。这将时间缩短为 0.4 秒

关于python - 使用 cython 加速 python 代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13959281/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com