python - 对称稀疏矩阵的高效切片-6ren

python - 对称稀疏矩阵的高效切片

转载作者：太空宇宙更新时间：2023-11-03 11:17:39

我有一个稀疏对称矩阵列表 sigma 使得

len(sigma) = N

对于所有的i,j,k,

sigma[i].shape[0] == sigma[i].shape[1] = m  # Square
sigma[i][j,k] == sigma[i][k,j]  # Symmetric

我有一个索引数组 P 这样

P.shape[0] = N
P.shape[1] = k

我的目标是使用 P[i,:] 给出的索引提取 sigma[i] 的 k x k 密集子矩阵。这可以按如下方式完成

sub_matrices = np.empty([N,k,k])
for i in range(N):
    sub_matrices[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()

但是请注意，虽然 k 很小，但 N(和 m)非常大。如果稀疏对称矩阵以 CSR 格式存储，这将花费很长时间。我觉得必须有更好的解决方案。例如，是否存在适合需要在两个维度上切片的对称矩阵的稀疏格式？

我正在使用 Python，但愿意接受任何我可以使用 Cython 接口(interface)的 C 库建议。

额外

请注意，我目前的 Cython 方法如下:

cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           long[:,:] P,
                           double[:,:,:] sub_matrices):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
    """
    # Create variables for keeping code tidy
    cdef long N = P.shape[0]
    cdef long k = P.shape[1]

    cdef long i
    cdef long j
    cdef long index_pointer 
    cdef long sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef double[:] data
    cdef long[:] indices
    cdef long[:] indptr

    # Object for the ordered P
    cdef long[:] perm

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in range(N):
        # Sort the P
        perm = np.argsort(P[i,:])

        # Get the sparse matrix values
        data     = sigma[i].data
        indices  = sigma[i].indices.astype(long)
        indptr   = sigma[i].indptr.astype(long)

        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]

            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).
            index_pointer = 0
            sparse_row_pointer = indptr[P[i, perm[j]]]

            while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[j]] + 1])):
                if indices[sparse_row_pointer] == P[i, perm[index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[j], perm[index_pointer]] = \
                           data[sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer += 1
                elif indices[sparse_row_pointer] > P[i, perm[index_pointer]]:
                    # Need to increment index pointer
                    index_pointer += 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer += 1

我相信 np.argsort 经常在相对较小的向量上调用时可能效率低下并且想换成 C 实现。我也没有利用可能在 N 稀疏矩阵上加速的并行处理。不幸的是，由于外部循环中存在 Python 强制转换，我不知道如何使用 prange。

另一点需要注意的是，Cython 方法似乎使用了大量内存，但我不知道它被分配到哪里。

最新版本

根据ead的建议，下面是最新版本的Cython代码

cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           np.ndarray[np.int32_t, ndim=2] P,
                           np.float64_t[:,:,:] sub_matrices,
                           int symmetric):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
        symmetric: 1 if the sigma matrices are symmetric
    """
    # Create variables for keeping code tidy
    cdef np.int32_t N = P.shape[0]
    cdef np.int32_t k = P.shape[1]

    cdef np.int32_t i
    cdef np.int32_t j
    cdef np.int32_t index_pointer 
    cdef np.int32_t sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef np.float64_t[:] data
    cdef np.int32_t[:] indices

    cdef np.int32_t[:] indptr

    # Object for the ordered P
    cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in range(N):
        # Get the sparse matrix values
        data     = sigma[i].data
        indices  = sigma[i].indices
        indptr   = sigma[i].indptr

        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]

            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).

            if symmetric:
                index_pointer = j  # Only search upper triangular
            else:
                index_pointer = 0
            sparse_row_pointer = indptr[P[i, perm[i, j]]]

            while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[i, j]] + 1])):
                if indices[sparse_row_pointer] == P[i, perm[i, index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
                           data[sparse_row_pointer]

                    if symmetric:
                        sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
                               data[sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer += 1
                elif indices[sparse_row_pointer] > P[i, perm[i, index_pointer]]:
                    # Need to increment index pointer
                    index_pointer += 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer += 1

平行版

下面是一个并行版本，虽然它似乎没有提供任何加速并且代码不再那么好看:

# See https://stackoverflow.com/questions/48805636/efficient-slicing-of-symmetric-sparse-matrices
cimport cython
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
from cython.parallel import prange

@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
                           np.ndarray[np.int32_t, ndim=2] P,
                           np.float64_t[:,:,:] sub_matrices,
                           int symmetric):
    """
    Inputs:
        sigma: A list (N,) of sparse sp.csr_matrix (m x m)
        P: A 2D array of integers (N, k)
        sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
        symmetric: 1 if the sigma matrices are symmetric
    """
    # Create variables for keeping code tidy
    cdef np.int32_t N = P.shape[0]
    cdef np.int32_t k = P.shape[1]

    cdef np.int32_t i
    cdef np.int32_t j
    cdef np.int32_t index_pointer 
    cdef np.int32_t sparse_row_pointer

    # Create objects for holding sparse matrix data
    cdef np.float64_t[:] data_mem_view
    cdef np.int32_t[:] indices_mem_view
    cdef np.int32_t[:] indptr_mem_view

    cdef np.float64_t **data = <np.float64_t **> malloc(N * sizeof(np.float64_t *))
    cdef np.int32_t **indices = <np.int32_t **> malloc(N * sizeof(np.int32_t *))
    cdef np.int32_t **indptr = <np.int32_t **> malloc(N * sizeof(np.int32_t *))

    for i in range(N):
        data_mem_view = sigma[i].data
        data[i] = &(data_mem_view[0])

        indices_mem_view = sigma[i].indices
        indices[i] = &(indices_mem_view[0])

        indptr_mem_view = sigma[i].indptr
        indptr[i] = &(indptr_mem_view[0])

    # Object for the ordered P
    cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)

    # Make sure sub_matrices is all 0
    sub_matrices[:] = 0

    for i in prange(N, nogil=True):
        for j in range(k):
            # Loop over row P[i, perm[j]] in sigma searching for values
            # in P[i, :] vector i.e. compare
            #     sigma[P[i, perm[j], :]
            # against
            #     P[i,:]
            # To do this we need our sparse row vector with columns 
            #     indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # and data/values
            #     data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
            # which comes from the csr matrix format.
            # We also need our sorted indexing vector
            #     P[i, perm[:]]

            # We begin by pointing at the top of both
            # our vectors and gradually move down them. In the event of 
            # an equality we add the data to sub_matrices[i,:,:] and 
            # increment the INDEXING VECTOR pointer, not the sparse
            # row vector pointer, as there can be multiple values that 
            # are the same in the indexing vector but not the sparse row
            # column vector (only 1 column can appear in 1 row!).

            if symmetric:
                index_pointer = j  # Only search upper triangular
            else:
                index_pointer = 0
            sparse_row_pointer = indptr[i][P[i, perm[i, j]]]

            while ((index_pointer < k) and 
                   (sparse_row_pointer < indptr[i][P[i, perm[i, j]] + 1])):
                if indices[i][sparse_row_pointer] == P[i, perm[i, index_pointer]]:
                    # We can add data to sub_matrices
                    sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
                           data[i][sparse_row_pointer]

                    if symmetric:
                        sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
                               data[i][sparse_row_pointer]

                    # Only increment the index pointer
                    index_pointer = index_pointer + 1
                elif indices[i][sparse_row_pointer] > P[i, perm[i, index_pointer]]:
                    # Need to increment index pointer
                    index_pointer = index_pointer + 1
                else:
                    # Need to increment sparse row pointer
                    sparse_row_pointer = sparse_row_pointer + 1

    # Free malloc'd data
    free(data)
    free(indices)
    free(indptr)

测试

测试代码运行

cythonize -i sparse_slice.pyx

其中 sparse_slice.pyx 是文件名。然后你可以使用这个脚本:

import time
import numpy as np
import scipy as sp
import scipy.sparse
from sparse_slice import sparse_slice_fast_cy

k = 100
N = 20000
m = 10000
samples = 20

# Create sigma matrices
## The sampling of random sparse takes a while so just do a few and 
## then populate with these.
now = time.time()
sigma_samples = []
for i in range(samples):
    sigma_samples.append(sp.sparse.rand(m, m, density=0.001, format='csr'))
    sigma_samples[-1] = sigma_samples[-1] + sigma_samples[-1].T  # Symmetric

## Now make the sigma list from these.
sigma = []
for i in range(N):
    j = np.random.randint(samples)
    sigma.append(sigma_samples[j])
print('Time to make sigma: {}'.format(time.time() - now))

# Create indexer
now = time.time()
P = np.empty([N, k]).astype(int)
for i in range(N):
    P[i, :] = np.random.choice(np.arange(m), k, replace=True)
print('Time to make P: {}'.format(time.time() - now))

# Create objects for holding the slices
sub_matrices_slow = np.empty([N, k, k])
sub_matrices_fast = np.empty([N, k, k])

# Run both slicings
## Slow
now = time.time()
for i in range(N):
    sub_matrices_slow[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()
print('Time to make sub_matrices_slow: {}'.format(time.time() - now))

## Fast
symmetric = 1
now = time.time()
sparse_slice_fast_cy(sigma, P.astype(np.int32), sub_matrices_fast, symmetric)
print('Time to make sub_matrices_fast: {}'.format(time.time() - now))

assert(np.all((sub_matrices_slow - sub_matrices_fast)**2 < 1e-6))

最佳答案

现在无法测试，但有两个建议:

A) 在 i-loop 上一次对所有行进行排序:

# Object for the ordered P
cdef long[:,:] perm = np.argsort(P, axis=1)

也许您需要将 P 作为 np.ndarray[np.int64_t, ndim=2] P(或任何类型)传递以避免复制。您必须通过 perm[i,X] 而不是 perm[X] 访问数据。

B) 定义

cdef np.int32_t[:] indices
cdef np.int32_t[:] indptr

因此您不需要通过“.astype”复制数据，即

for i in range(N):
    data     = sigma[i].data
    indices  = sigma[i].indices
    indptr   = sigma[i].indptr

我认为因为 sigma[i] 有 O(m) 元素，复制是你函数的瓶颈:你得到运行时间 O(N *(m+k^2)) 而不是 `O(N*k^2) - 最好避免它。

否则功能看起来还不错。

为了让 prange 与 i 循环一起工作，您应该通过创建一种指向 data、indices 和 indptr 的第一个元素的指针数组，并在廉价的预处理步骤中填充它们。可以让它工作，但问题是并行化的 yield 有多少——很可能是这种情况，问题是内存限制的——必须看时间安排。

您还可以通过仅处理上三角矩阵来使用对称性:

  ...
  index_pointer = j #only upper triangle!
  ....
  ....
     # We can add data to sub_matrices
     #upper triangle sub-matrix:
     sub_matrices[i, perm[j], perm[index_pointer]] = \
                       data[sparse_row_pointer]
     #lower triangle sub-matrix:
     sub_matrices[i, perm[index_pointer], perm[j]] = \
                       data[sparse_row_pointer]
  ....

我会从 B) 开始，看看结果如何...

编辑:

关于内存使用:可以通过以下方式测量峰值内存使用

 /usr/bin/time -f "peak_used_memory:%M(in Kb)" python test.py

我使用 N=2000 运行我的测试并得到 (python3.6+cython0.27.1):

                             peak memory usage
only slow                       245Mb
only fast                       245Mb
slow+fast no check              402Mb
slow+fast+assert                576Mb

因此有 50Mb 的开销，两个函数使用了 200Mb，另外还有 176Mb 用于评估断言。对于 N 的其他值，我也可以看到相同的行为。

所以我想说 cython 没有大量的内存使用。

此任务很可能(至少部分)受内存限制，因此并行化不会有太大帮助。您应该减少加载到缓存的内存量。

一种可能是不使用perm - 毕竟它也需要加载到缓存中。你可以做到，如果

您可以接受矩阵 sigma 中的任何行/列排列，而不仅仅是对 P 进行排序并使用它。
每行的元素很少，因此对每个元素进行线性搜索就可以了。
对每个元素进行二分查找

我想在最好的情况下你可以赢大约 20-30%。

有时 cython 生成的代码不容易针对 c 编译器进行优化，直接用 C 编写然后用 python 包装它通常会取得更好的结果。

但只有当此操作确实是您程序的瓶颈时，我才会这样做。

顺便声明一下

cdef np.int64_t[:,:] perm = np.argsort(P, axis=1)

您不需要额外复印。

关于python - 对称稀疏矩阵的高效切片，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48805636/

文章推荐： c# - 绑定(bind)到 TabControl SelectedIndex

文章推荐： c# - 我如何在 Spring.NET 中使用 EmailValidator？

java - 如何更改字符串中单词的位置(对称)
这个问题已经有答案了: Reverse the ordering of words in a string (48 个回答) 已关闭 4 年前。我想更改字符串中单词的位置。它需要对称变化。示例 m
security - 对称 key 存储
我的公司将为客户存储敏感数据，并将使用托管 .NET 加密算法类之一来加密数据。大部分工作已经完成，但我们还没有弄清楚如何/在哪里存储 key 。我已经做了一些简单的搜索和阅读，看起来硬件解决方案可能
postgresql - 对称 DS 字节阈值
我在 postgres 和对称 ds 的默认配置中使用对称 ds。我总是收到以下错误。 2017-12-20 09:59:53,372 INFO [SymmetricLauncher] [Wrap
postgresql - 对称 ds 问题
我正在使用 postgresql8.3 并在我的应用程序中包含 symmetris ds 1.5.1。但客户端到服务器的复制工作正常。但复制不是从服务器到客户端完成的。我是使用对称 ds 的新手。任何
java - 对称 key 加密算法
我正在寻找一种与 JavaScript 和 Java 兼容的安全对称 key 加密算法。我已经尝试实现一个，但我遇到了一些编码问题。最佳答案您不想使用 JavaScript 加密，especia
algorithm - 对称 DDA 是什么意思？
我读过 DDA .但我刚刚遇到symmetric DDA 这个术语。它是什么？它与 DDA 有何不同？最佳答案 DDA(数字差分分析仪)算法用于找出任意给定两点之间的线性插值点(即直线)。现在，由
Spring Cloud Config 对称 key
我已经使用 Spring Cloud Config Server 设置了一个简单的项目，我正在尝试简单地加密和解密一些值。我使用以下带有 Spring Boot 的 pom.xml 将项目创建为 Sp
java - 对称 DS 和 Java
我需要通过扩展 Symmetric DS 提供的接口(interface)来扩展它的功能。有谁知道开发流程应该是什么？在文档中，它只解释了将 JAR 文件(包含扩展接口(interface)的类)放在
c++ - 可以搜索相同成本的值和键的 map ？对称 map ？
我将在中最多包含 50 个条目 map 。这样做的原因是我在初始握手后使用的协议(protocol)通过数字引用字符串名称 - 我假设服务器上必须存在与我的类似的 map 。我想要的是一个可以搜
c++ - 如何测试一个集合是否自反、对称、反对称和/或传递？
我在尝试编写这些函数时遇到困难。他们工作不正常，不知道我做错了什么。至于 Transitive，我什至无法开始，希望你能提供任何帮助，以及我在我的功能中做错了什么。谢谢。示例输入: 0 1 2 3
c# - .Net 中(对称)加密的最佳实践？
什么是加密 SQL 数据库中某些敏感或个人身份数据的“最佳实践”(根据 PCI、HIPAA 或其他适用的合规性标准)？这里有很多关于解决方案各个方面的问题，但我还没有看到任何在高层次上讨论该方法的问
css - 在 CSS 中创建一个六边形，对称
我必须创建一个六边形，我真的希望它是完整的 HTML 和 CSS。它几乎完成了，除了它不是完全对称的。左 Angular 与右 Angular 不对齐。当前的CSS: .hexagon.outer {
sip - 什么时候需要TURN？对称 NAT 和端口限制 NAT
我发现:“唯一需要 TURN 的情况是当其中一个对等点位于对称 NAT 后面，而另一个对等点位于对称 NAT 或端口限制 NAT 后面时。”那么，对称 NAT 后面的对等点如何连接后面的另一个点(例如
arrays - 按函数对(对称)numpy 二维数组进行排序。 (规范)
如何有效地按行的范数对矩阵进行排序(使用 numpy.ndarrays)？我想对矩阵 A 进行排序: A = np.array( ( [ 10, 1, 6, 3 ],
c - MBED TLS 对称 key 包装
我正在尝试使用 MBED TLS 加密函数来解开已使用我拥有的对称 key 使用 AES-128 key 包装进行加密的 key 。我是加密新手，我的理解是 key 包装/解开与加密/解密不同。这是
java - 无法从文本文件中解密密文，对称 key 实现。在java中
所以基本上我的程序从用户选择的文本文件中加密/解密字符串。他可以选择五种算法之一。问题是当我用例如创建密文时。 AES然后将此密文保存到文本文件中，并想解密它以获取原始字符串，这是行不通的。有人可以指
c++ - 我怎样才能最有效地映射 OpenCL 中厄米特(对称)矩阵的内核范围？
我正在开展一个 OpenCL 项目以生成非常大的厄尔米特(对称)矩阵，并且我正在尝试确定生成工作 ID 的最佳方式。厄密矩阵沿对角线对称，因此 M(i,j) = M*(j,i)。在暴力方式下，fo
css - 样式 div 在两行中与 flexbox 对称
我想让底部圆圈对称，这意味着我希望第 5 个圆圈介于第 1 和第 2 个(但仍在下方)之间，第 7 个圆圈介于第 3 和第 4 个之间。我在 v-for 循环中显示这个圆圈。我将它们全部放在一个容器
c++ - 固定维数(N=9)、对称、半正定稠密线性系统的快速解
对于固定维数 (N=9) 的稠密线性系统(矩阵是对称的，半正定的)的快速求解，您会推荐哪种算法？高斯消元法 LU分解 Cholesky 分解等等？类型是 32 位和 64 位 float 。这
Python，许多 3x3、奇异、对称、矩阵的同时伪反转
我有一个尺寸为行 x 列 x 深度的 3D 图像。对于图像中的每个体素，我计算了一个 3x3 实对称矩阵。它们存储在数组 D 中，因此具有形状 (rows, cols, deps, 6)。 D 为图像

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 对称稀疏矩阵的高效切片