gpt4 book ai didi

python - 使用 Numpy 高效求和复杂矩阵乘积

转载 作者:太空狗 更新时间:2023-10-29 20:59:59 24 4
gpt4 key购买 nike

我有一个矩阵 X,我正在为其计算中间矩阵乘积的加权和。这是一个最小的可重现示例:

import numpy as np

random_state = np.random.RandomState(1)
n = 5
p = 10

X = random_state.rand(p, n) # 10x5
X_sum = np.zeros((n, n)) # 5x5

# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = random_state.rand(y)

for k in range(y):
X_sum += np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k]

这工作正常并产生了我期望的结果。然而,随着 ny 的大小增长(成百上千),这变得非常昂贵,因为重复计算矩阵乘积并不是非常有效......

然而,产品的计算方式有一个明显的模式:

First iteration

Second iteration

您可以看到随着迭代的进行,Xt 中的起始列切片向右移动,而 X 中的结束行向上移动。这是第 N 次迭代的样子:

Nth iteration

这实际上意味着重复乘以相同值的子集(参见edit 2),在我看来这可能是一个利用的机会......(即,如果我要一次手动计算乘积)。

但我希望不必手动执行任何操作,并且可能有一种很好的方法可以使用 Numpy 更优雅地实现整个循环。

编辑 1

一组现实的数字:

n = 400
p = 2000
y = 750

编辑2

解决评论:

Could you explain what values are repeatedly multiplied?

考虑以下数组:

n = p = 5
X = np.arange(25).reshape(p, n)

对于 k=0,第一个产品将在 AB 之间:

k = 0
A = X.T[:, k + 1:]
B = X[:p - (k + 1), :]
>>> A
array([[ 5, 10, 15, 20],
[ 6, 11, 16, 21],
[ 7, 12, 17, 22],
[ 8, 13, 18, 23],
[ 9, 14, 19, 24]])
>>> B
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])

k=1 时:

k = 1
>>> A
array([[10, 15, 20],
[11, 16, 21],
[12, 17, 22],
[13, 18, 23],
[14, 19, 24]])
>>> B
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])

因此,如果有意义的话,每个后续矩阵乘积都是前一个乘积的子集。

最佳答案

TLDR;我会选择@Parfait 基于对 npy 的各种值进行基准测试而使用 test_gen_sum .为了连续性,在这里保留旧答案

评估npy如何影响算法的选择

此分析是使用@Parfait 的函数完成的,作为确定是否真的存在一个 最佳解决方案或是否存在基于n 值的一系列解决方案的方法、py

import numpy as np
import pytest # This code also requires the pytest-benchmark plugin

def test_for_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)

for k in range(y):
X_sum += np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k]

return X_sum


def test_list_sum(n, p, y):
random_state = np.random.RandomState(1)

X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)

matrix_list = [np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]

X_sum = np.sum(matrix_list, axis=0)

return X_sum


def test_reduce_sum(n, p, y):
random_state = np.random.RandomState(1)

X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)

matrix_list = [(X.T[:, k + 1:] @
X[:p - (k + 1), :]) * weights[k] for k in range(y)]

X_sum = reduce(lambda x,y: x + y, matrix_list)

return X_sum


def test_concat_sum(n, p, y):
random_state = np.random.RandomState(1)

X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)

x_mat = np.concatenate([np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) for k in range(y)])

wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)])

mul_res = x_mat * wgt_mat
X_sum = mul_res.reshape(-1, n, n).sum(axis=0)

return X_sum


def test_matmul_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
# Use list comprehension and np.matmul
matrices_list = [np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]

# Sum matrices in list of matrices to get the final result
X_sum = np.sum(matrices_list, axis=0)

return X_sum


def test_gen_sum(n, p, y):
random_state = np.random.RandomState(1)

X = random_state.rand(p, n)
X_sum = np.zeros((n, n))

# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)

matrix_gen = (np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y))

X_sum = sum(matrix_gen)

return X_sum

parameters = [
pytest.param(400, 800, 3)
,pytest.param(400, 2000, 3)
,pytest.param(400, 800, 750)
,pytest.param(400, 2000, 750)
]

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_for_sum(benchmark, n, p, y):
benchmark(test_for_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_list_sum(benchmark, n, p, y):
benchmark(test_list_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_reduce_sum(benchmark, n, p, y):
benchmark(test_reduce_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_concat_sum(benchmark, n, p, y):
benchmark(test_concat_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_matmul_sum(benchmark, n, p, y):
benchmark(test_matmul_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_gen_sum(benchmark, n, p, y):
benchmark(test_gen_sum, n=n, p=p, y=y)
  • n=400p=800y=3(100 次迭代)

    • 获胜者:test_gen_sum enter image description here
  • n=400p=2000y=3(100 次迭代)

    • 获胜者:test_gen_sum enter image description here
  • n=400, p=800, y=750(10 次迭代)

    • 获胜者:test_gen_sum enter image description here
  • n=400, p=2000, y=750(10 次迭代)

    • 获胜者:test_gen_sum enter image description here

旧答案

较小的 y

我肯定会使用 np.matmul而不是 np.dot 这将使您获得最大的性能提升,事实上 np.dot 的文档将引导您使用 np.matmul 代替 np.dot 进行二维数组乘法。

我测试了 np.dotnp.matmul 使用和不使用列表理解以及 pytest-benchmark结果在这里:

y=3

顺便说一句,pytest-benchmark 非常灵活,我强烈推荐在这种情况下使用它来验证方法是否真正高效。

仅使用列表推导式对 np.matmul 结果的影响几乎可以忽略不计,对 np.dot(尽管这是更好的形式)的方案有负面影响事情,但两种变化的结合产生了最好的结果。我会警告说,使用列表理解往往会提高标准。开发者运行时,因此与仅使用 np.matmul 相比,您可能会看到更大的运行时性能范围。

代码如下:

import numpy as np

def test_np_matmul_list_comprehension():
random_state = np.random.RandomState(1)
n = p = 1000
X = np.arange(n * n).reshape(p, n)

# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = [1, 1, 1]
# Use list comprehension and np.matmul
matrices_list = [np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]

# Sum matrices in list of matrices to get the final result
X_sum = np.sum(matrices_list, axis=0)

更大的y

对于较大的 y 值,您最好不要使用列表理解。在这两种情况下,np.dotnp.matmul 的平均/中值运行时间往往更大。以下是 (n=500,p=5000,y=750) 的 pytest-benchmark 结果:

enter image description here

这可能有点矫枉过正,但我​​宁愿过于乐于助人 :)。

关于python - 使用 Numpy 高效求和复杂矩阵乘积,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53981660/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com