python - 具有广播的稀疏 Scipy 矩阵和向量的元素最大值-6ren

python - 具有广播的稀疏 Scipy 矩阵和向量的元素最大值

转载作者：行者123 更新时间：2023-12-03 13:56:18

我需要一个快速的逐元素最大值，它将 n×m scipy 稀疏矩阵元素的每一行与稀疏的 1×m 矩阵进行比较。这在 Numpy 中使用 np.maximum(mat, vec) 完美运行通过 Numpy 的广播。
然而，Scipy 的 .maximum()没有广播。我的矩阵很大，所以我不能将它转换为一个 numpy 数组。
我目前的解决方法是使用 mat[row,:].maximum(vec) 遍历多行垫子.这个大循环是破坏我的代码效率(必须多次完成)。我的缓慢解决方案在下面的第二个代码片段中 - 有更好的解决方案吗？

# Example
import numpy as np
from scipy import sparse

mat = sparse.csc_matrix(np.arange(12).reshape((4,3)))

vec = sparse.csc_matrix([-1, 5, 100])

# Numpy's np.maximum() gives the **desired result** using broadcasting (but it can't handle sparse matrices):
numpy_result = np.maximum( mat.toarray(), vec.toarray() )
print( numpy_result )
# [[  0   5 100]
#  [  3   5 100]
#  [  6   7 100]
#  [  9  10 100]]

# Scipy only compares the top row of mat to vec (no broadcasting!):
scipy_result = mat.maximum(vec)
print( scipy_result.toarray() )
# [[  0   5 100]
#  [  3   4   5]
#  [  6   7   8]
#  [  9  10  11]]

#Reversing the order of mat and vec in the call to vec.maximum(mat) results in a single row output, and also frequently seg faults (!):

用于速度测试的更大示例和当前解决方案

import numpy as np
from scipy import sparse
import timeit

mat = sparse.csc_matrix(  sparse.random(20000, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

vec = sparse.csc_matrix(  sparse.random(1, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

def sparse_elementwise_maximum(mat, vec):
    output = sparse.lil_matrix(mat.shape)
    for row_idx in range( mat.shape[0] ):
        output[row_idx] = mat[row_idx,:].maximum(vec)
    return output

# Time it
num_timing_loops = 3.0
starttime = timeit.default_timer()
for _ in range(int(num_timing_loops)):
    sparse_elementwise_maximum(mat, vec)
print('time per call is:', (timeit.default_timer() - starttime)/num_timing_loops, 'seconds')
# 15 seconds per call (way too slow!)

编辑
我接受 Max 的回答，因为该问题专门针对高性能解决方案，而 Max 的解决方案在我尝试的各种输入上提供了 1000x-2500x 的巨大加速，但代价是增加了更多代码行和 Numba 编译。但是，对于一般用途，Daniel F 的 one-liner 是一个很好的解决方案，在我尝试过的示例上提供了 10 到 50 倍的加速——我可能会用于许多其他事情。

最佳答案

低级方法
与往常一样，您可以考虑如何为该操作构建适当的稀疏矩阵格式，对于 csr 矩阵，主要组件是 shape、data_arr、indices 和 ind_ptr。
使用 scipy.sparse.csr 对象的这些部分，使用编译语言(C、C++、Cython、Python-Numba)实现高效算法非常简单，但可能有点耗时。 Int 他的实现我使用了 Numba，但是将它移植到 C++ 应该很容易(语法更改)并且可能避免切片。
实现(初试)

import numpy as np
import numba as nb

# get all needed components of the csr object and create a resulting csr object at the end
def sparse_elementwise_maximum_wrap(mat,vec):
    mat_csr=mat.tocsr()
    vec_csr=vec.tocsr()

    shape_mat=mat_csr.shape
    indices_mat=mat_csr.indices
    indptr_mat=mat_csr.indptr
    data_mat=mat_csr.data
    indices_vec=vec_csr.indices
    data_vec=vec_csr.data

    res=sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,indices_vec,data_vec)
    res=sparse.csr_matrix(res, shape=shape_mat)
    return res

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data):
    data_res=[]
    indices_res=[]
    indptr_mat_res=[]

    indptr_mat_=0
    indptr_mat_res.append(indptr_mat_)

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res.append(max(mat_row_data[mat_ptr],vec_row_data[vec_ptr]))
                indices_res.append(ind_mat)
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res.append(mat_row_data[mat_ptr])
                    indices_res.append(ind_mat)
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res.append(vec_row_data[vec_ptr])
                    indices_res.append(ind_vec)
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res.append(mat_row_data[i])
                indices_res.append(mat_row_ind[i])
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res.append(vec_row_data[i])
                indices_res.append(vec_row_ind[i])
                indptr_mat_+=1
        indptr_mat_res.append(indptr_mat_)

    return np.array(data_res),np.array(indices_res),np.array(indptr_mat_res)

实现(优化)
在这种方法中，列表被动态调整大小的数组替换。我以 60 MB 的步长增加了输出的大小。在创建 csr 对象时，也没有复制的数据，只有引用。如果你想避免内存开销，你必须最后复制数组。

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data):
    mem_step=5_000_000
    #preallocate memory for 5M non-zero elements (60 MB in this example)
    data_res=np.empty(mem_step,dtype=data_mat.dtype)
    indices_res=np.empty(mem_step,dtype=np.int32)
    data_res_p=0

    indptr_mat_res=np.empty((shape_mat[0]+1),dtype=np.int32)
    indptr_mat_res[0]=0
    indptr_mat_res_p=1
    indptr_mat_=0

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        #check if resizing is necessary
        if data_res.shape[0]<data_res_p+shape_mat[1]:
            #add at least memory for another mem_step elements
            size_to_add=mem_step
            if shape_mat[1] >size_to_add:
                size_to_add=shape_mat[1]

            data_res_2   =np.empty(data_res.shape[0]   +size_to_add,data_res.dtype)
            indices_res_2=np.empty(indices_res.shape[0]+size_to_add,indices_res.dtype)
            for i in range(data_res_p):
                data_res_2[i]=data_res[i]
                indices_res_2[i]=indices_res[i]
            data_res=data_res_2
            indices_res=indices_res_2

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res[data_res_p]=max(mat_row_data[mat_ptr],vec_row_data[vec_ptr])
                indices_res[data_res_p]=ind_mat
                data_res_p+=1
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res[data_res_p]=mat_row_data[mat_ptr]
                    indices_res[data_res_p]=ind_mat
                    data_res_p+=1
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res[data_res_p]=vec_row_data[vec_ptr]
                    indices_res[data_res_p]=ind_vec
                    data_res_p+=1
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res[data_res_p]=mat_row_data[i]
                indices_res[data_res_p]=mat_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res[data_res_p]=vec_row_data[i]
                indices_res[data_res_p]=vec_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        indptr_mat_res[indptr_mat_res_p]=indptr_mat_
        indptr_mat_res_p+=1

    return data_res[:data_res_p],indices_res[:data_res_p],indptr_mat_res

开始时分配的最大内存
这种方法的性能和可用性在很大程度上取决于输入。在这种方法中分配了最大内存(这很容易导致内存不足错误)。

@nb.njit(cache=True)
def sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,vec_row_ind,vec_row_data,shrink_to_fit):
    max_non_zero=shape_mat[0]*vec_row_data.shape[0]+data_mat.shape[0]
    data_res=np.empty(max_non_zero,dtype=data_mat.dtype)
    indices_res=np.empty(max_non_zero,dtype=np.int32)
    data_res_p=0

    indptr_mat_res=np.empty((shape_mat[0]+1),dtype=np.int32)
    indptr_mat_res[0]=0
    indptr_mat_res_p=1
    indptr_mat_=0

    for row_idx in range(shape_mat[0]):
        mat_row_ind=indices_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]
        mat_row_data=data_mat[indptr_mat[row_idx]:indptr_mat[row_idx+1]]

        mat_ptr=0
        vec_ptr=0
        while mat_ptr<mat_row_ind.shape[0] and vec_ptr<vec_row_ind.shape[0]:
            ind_mat=mat_row_ind[mat_ptr]
            ind_vec=vec_row_ind[vec_ptr]

            #value for both matrix and vector is present
            if ind_mat==ind_vec:
                data_res[data_res_p]=max(mat_row_data[mat_ptr],vec_row_data[vec_ptr])
                indices_res[data_res_p]=ind_mat
                data_res_p+=1
                mat_ptr+=1
                vec_ptr+=1
                indptr_mat_+=1

            #only value for the matrix is present vector is assumed 0
            elif ind_mat<ind_vec:
                if mat_row_data[mat_ptr] >0:
                    data_res[data_res_p]=mat_row_data[mat_ptr]
                    indices_res[data_res_p]=ind_mat
                    data_res_p+=1
                    indptr_mat_+=1
                mat_ptr+=1

            #only value for the vector is present matrix is assumed 0
            else:
                if vec_row_data[vec_ptr] >0:
                    data_res[data_res_p]=vec_row_data[vec_ptr]
                    indices_res[data_res_p]=ind_vec
                    data_res_p+=1
                    indptr_mat_+=1
                vec_ptr+=1

        for i in range(mat_ptr,mat_row_ind.shape[0]):
            if mat_row_data[i] >0:
                data_res[data_res_p]=mat_row_data[i]
                indices_res[data_res_p]=mat_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        for i in range(vec_ptr,vec_row_ind.shape[0]):
            if vec_row_data[i] >0:
                data_res[data_res_p]=vec_row_data[i]
                indices_res[data_res_p]=vec_row_ind[i]
                data_res_p+=1
                indptr_mat_+=1
        indptr_mat_res[indptr_mat_res_p]=indptr_mat_
        indptr_mat_res_p+=1

    if shrink_to_fit==True:
        data_res=np.copy(data_res[:data_res_p])
        indices_res=np.copy(indices_res[:data_res_p])
    else:
        data_res=data_res[:data_res_p]
        indices_res=indices_res[:data_res_p]

    return data_res,indices_res,indptr_mat_res

# get all needed components of the csr object and create a resulting csr object at the end
def sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=True):
    mat_csr=mat.tocsr()
    vec_csr=vec.tocsr()

    shape_mat=mat_csr.shape
    indices_mat=mat_csr.indices
    indptr_mat=mat_csr.indptr
    data_mat=mat_csr.data
    indices_vec=vec_csr.indices
    data_vec=vec_csr.data

    res=sparse_elementwise_maximum_nb(indices_mat,indptr_mat,data_mat,shape_mat,indices_vec,data_vec,shrink_to_fit)
    res=sparse.csr_matrix(res, shape=shape_mat)
    return res

计时
Numba 有编译开销或一些开销来从缓存加载函数。如果要获取运行时而不是编译+运行时，请不要考虑第一次调用。

import numpy as np
from scipy import sparse

mat = sparse.csr_matrix(  sparse.random(20000, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )
vec = sparse.csr_matrix(  sparse.random(1, 4000, density=.01, data_rvs=lambda s: np.random.randint(0, 5000, size=s))  )

%timeit output=sparse_elementwise_maximum(mat, vec)
#for csc input
37.9 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#for csr input
10.7 s ± 90.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Daniel F
%timeit sparse_maximum(mat, vec)
164 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#low level implementation (first try)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
89.7 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#low level implementation (optimized, csr)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#low level implementation (preallocation, without copying at the end)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#low level implementation (preallocation, with copying at the end)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec)
16.5 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit res=sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=False)
14.9 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit res=sparse_elementwise_maximum_wrap(mat,vec,shrink_to_fit=True)
21.7 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

#For comparison, copying the result takes
%%timeit
np.copy(res.data)
np.copy(res.indices)
np.copy(res.indptr)
7.8 ms ± 47.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 具有广播的稀疏 Scipy 矩阵和向量的元素最大值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64872560/

文章推荐： c - 如何用一个字符替换两个字符？

文章推荐： ruby-on-rails - 为什么 InstanceMethods 模块已被弃用？

文章推荐： celery :每个任务的并发限制(每个任务的 worker 数)？

python - 向量 * 向量(矩阵乘法)
我想用一个向量执行以下操作。 a = np.array(np.arange(0, 4, 1)) 我想得到一个乘法，结果是一个矩阵 | 0 1 2 3 4 -| - - - - - - - 0
R:gsub，模式=向量，替换=向量
正如标题所述，我正在尝试使用 gsub，其中我使用向量作为“模式”和“替换”。目前，我的代码如下所示: names(x1) names(x1) [1] "2110023264A.Ms.Amp
python - 将 numpy (n,) 向量 reshape 为 (n,1) 向量
所以当我需要做一些线性代数时，我更容易将向量视为列向量。因此，我更喜欢 (n,1) 这样的形状。形状 (n,) 和 (n,1) 之间是否存在显着的内存使用差异？什么是首选方式？以及如何将 (n,
r - 为什么 seq() 同时创建 int 和 num 向量，而 c() 从不创建 int 向量？
我不明白为什么 seq() 可以根据元素中是否存在小数点输出不同的类，而 c() 总是创建一个 num向量，无论是否存在小数。例如: seqDec <- seq(1, 2, 0.5) # num v
机器学习的数学基础--向量，矩阵
机器学习与传统编程的一个重要区别在于机器学习比传统编程涉及了更多的数学知识。不过，随着机器学习的飞速发展，各种框架应运而生，在数据分析等应用中使用机器学习时，使用现成的库和框架成为常态，似乎越来越不需
chisel - RegEnable 向量
寻找有关如何将 RegEnable 用作向量的示例/建议。此外，我想控制输入和使能信号成为 Vector 中寄存器索引的函数。首先，我如何声明 RegEnable() 的 Vector，其次如何迭代
r - 将分配函数应用于所需变量名称的列表/向量
假设我有一个包含变量名称的向量 v1，我想为每个变量分配一个值(存储在单独的向量中)。我如何在没有迭代的情况下做到这一点？ v1 <- c("a","b","c") v2 <- c(1,2,3) 我想
arrays - 向量、矩阵和数组数据类型之间有什么区别？
R 提供了三种类型来存储同质对象列表:向量、矩阵和数组。据我所知: 向量是一维数组的特殊情况矩阵是二维数组的特例数组还可以具有任意维度级别(包括 1 和 2)。在向量上使用一维数组和在矩阵上
python - 向量、矩阵乘法和求和
我正在绕着numpy/scipy中的所有选项转圈。点积、乘法、matmul、tensordot、einsum 等我想将一维向量与二维矩阵(这将是稀疏csr)相乘并对结果求和，这样我就有了一个一维向量
python - 向量-向量乘法创建矩阵
我是一个 IDL 用户，正在慢慢切换到 numpy/scipy，并且有一个操作我在 IDL 中非常经常做，但无法用 numpy 重现: IDL> a = [2., 4] IDL> b = [3., 5
python向量*向量------>矩阵
在python计算机图形工具包中，有一个vec3类型用于表示三分量向量，但是我如何进行以下乘法: 三分量向量乘以其转置结果得到 3*3 矩阵，如下例所示: a = vec3(1,1,1) matrix
javascript - 向量，以最大速度计算运动力
我正在构建一款小型太空射击游戏。当涉及到空间物理学时，我曾经遇到过数学问题。用文字描述如下:有一个最大速度。因此，如果您全速行驶，您的飞船将在屏幕上一遍又一遍地移动，就像在旧的小行星游戏中一样。如果
Python 断言 isinstance() 向量
我正在尝试在 python 中实现 Vector3 类。如果我用 c++ 或 c# 编写 Vector3 类，我会将 X、Y 和 Z 成员存储为 float ，但在 python 中，我读到鸭式是要走
scala - 将本地向量转换为 RDD[向量]
我是 Spark 和 Scala 的新手，我正在尝试阅读有关 MLlib 的文档。 http://spark.apache.org/docs/1.4.0/mllib-data-types.html上的
r - 比较 boolean 向量
我有一个包含四个逻辑向量的数据框， v1 , v2 , v3 , v4 是对还是错。我需要根据 boolean 向量的组合对数据帧的每一行进行分类(例如， "None" , "v1 only" , "
r - 如何将散点图中每个点的垂线下降到(特征)向量？
我正在创建一个可视化来说明主成分分析的工作原理，方法是绘制一些实际数据的特征值(为了说明的目的，我将子集化为二维)。我想要来自 this fantastic PCA tutorial 的这两个图的组
r - 如何修剪 R 向量？
我有以下排序向量: > v [1] -1 0 1 2 4 5 2 3 4 5 7 8 5 6 7 8 10 11 如何在不遍历整个向量的情况下删除 -1、0 和 11
r - R 中的基于零的数组/向量
有什么方法可以让 R 对向量和其他序列数据结构使用基于零的索引，例如在 C 和 python 中。我们有一些代码在 C 中进行一些数值处理，我们正在考虑将其移植到 R 中以利用其先进的统计功能，但是
clojure - 解构 map 向量
我有一个函数可以查询我的数据库中最近的 X 个条目，它返回一个 map 向量，如下所示: [{:itemID "item1" :category "stuff" :price 5} {:itemI
clojure - 如何删除嵌套的 clojure 向量
我有 ([[AA ww me bl qw 100] [AA ee rr aa aa 100] [AA qq rr aa aa 90]] [[CC ww me bl qw 100] [CC ee rr

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 具有广播的稀疏 Scipy 矩阵和向量的元素最大值