gpt4 book ai didi

python - NumPy 矩阵类的弃用状态

转载 作者:行者123 更新时间:2023-12-02 07:55:06 27 4
gpt4 key购买 nike

NumPy 中 matrix 类的状态是什么?

我一直被告知我应该改用 ndarray 类。在我编写的新代码中使用 matrix 类是否值得/安全?我不明白为什么我应该使用 ndarray 来代替。

最佳答案

tl;博士: numpy.matrix 类正在被弃用。有一些引人注目的库依赖于该类作为依赖项(最大的一个是 scipy.sparse ),这阻碍了该类的适当短期弃用,但强烈建议用户使用 ndarray 类(通常使用 0x251341124函数)代替。随着用于矩阵乘法的 numpy.array 运算符的引入,矩阵的许多相对优势已被消除。

为什么(不是)矩阵类?
@numpy.matrix 的子类。它最初是为了方便在涉及线性代数的计算中使用,但与更一般的数组类的实例相比,它们的行为方式既有局限性,也有惊人的差异。行为根本差异的示例:

  • 形状:数组可以具有从 0 到无穷大(或 32)的任意维数。矩阵总是二维的。奇怪的是,虽然无法创建具有更多维度的矩阵,但可以将单个维度注入(inject)到矩阵中,最终得到一个多维矩阵:numpy.ndarray(并不是说这具有任何实际重要性)。
  • 索引:索引数组可以根据 np.matrix(np.random.rand(2,3))[None,...,None].shape == (1,2,3,1) 为您提供任何大小的数组。矩阵上的索引表达式总是会给你一个矩阵。这意味着两个arr[:,0]arr[0,:]用于2D阵列给你一个1D ndarray,而mat[:,0]具有形状(N,1)mat[0,:]具有形状(1,M)matrix的情况下。
  • 算术运算:过去使用矩阵的主要原因是对矩阵的算术运算(特别是乘法和幂)执行矩阵运算(矩阵乘法和矩阵幂)。数组的结果是元素乘法和幂。因此 mat1 * mat2 如果 mat1.shape[1] == mat2.shape[0] 有效,但 arr1 * arr2 有效,如果 arr1.shape == arr2.shape (当然结果意味着完全不同的东西)。此外,令人惊讶的是,mat1 / mat2 执行两个矩阵的元素除法。这种行为可能是从 ndarray 继承的,但对矩阵没有意义,特别是考虑到 * 的含义。
  • 特殊属性:mat.Amat.A1都与相同的值分别np.array(mat)np.array(mat).ravel(),阵列的观点:矩阵除了什么阵列具有有how you are indexing itmat.Tmat.H是矩阵的转置和共轭转置(伴随); arr.Tndarray 类中唯一存在的此类属性。最后, mat.Imat 的逆矩阵。

  • 编写适用于 ndarrays 或矩阵的代码很容易。但是当两个类有可能在代码中交互时,事情就开始变得困难了。特别是,许多代码可以自然地用于 ndarray 的子类,但 matrix 是一个行为不良的子类,它可以很容易地破坏试图依赖鸭子类型的代码。考虑以下使用形状为 (3,4) 的数组和矩阵的示例:
    import numpy as np

    shape = (3, 4)
    arr = np.arange(np.prod(shape)).reshape(shape) # ndarray
    mat = np.matrix(arr) # same data in a matrix
    print((arr + mat).shape) # (3, 4), makes sense
    print((arr[0,:] + mat[0,:]).shape) # (1, 4), makes sense
    print((arr[:,0] + mat[:,0]).shape) # (3, 3), surprising

    根据我们切片的维度,添加两个对象的切片是灾难性的。当形状相同时,矩阵和数组的加法都是按元素进行的。上面的前两种情况很直观:我们添加两个数组(矩阵),然后从每个数组(矩阵)中添加两行。最后一种情况确实令人惊讶:我们可能打算添加两列并最终得到一个矩阵。原因当然是 arr[:,0] 的形状为 (3,) 与形状 (1,3) 兼容,但 mat[:.0] 的形状为 0x231418。这两个是 a few handy attributes 一起形成 (3,1)

    最后,当 broadcast 首次实现 the (3,3) matmul operator was introduced in python 3.5 时,矩阵类的最大优势(即简洁地制定涉及大量矩阵乘积的复杂矩阵表达式的可能性)被删除。比较一个简单的二次形式的计算:
    v = np.random.rand(3); v_row = np.matrix(v)
    arr = np.random.rand(3,3); mat = np.matrix(arr)

    print(v.dot(arr.dot(v))) # pre-matmul style
    # 0.713447037658556, yours will vary
    print(v_row * mat * v_row.T) # pre-matmul matrix style
    # [[0.71344704]]
    print(v @ arr @ v) # matmul style
    # 0.713447037658556

    综上所述,很明显为什么矩阵类被广泛用于线性代数:中缀 @ 运算符使表达式变得不那么冗长且更易于阅读。但是,我们使用现代 python 和 numpy 获得了与 * 运算符相同的可读性。此外,请注意矩阵情况为我们提供了一个形状为 @ 的矩阵,它在技术上应该是一个标量。这也意味着我们不能将列向量与这个“标量”相乘:上例中的 (1,1) 会引发错误,因为形状为 (v_row * mat * v_row.T) * v_row.T(1,1) 的矩阵不能按此顺序相乘。

    为完整起见,应该注意的是,虽然 matmul 运算符修复了与矩阵相比 ndarrays 次优的最常见情况,但在使用 ndarrays 优雅地处理线性代数方面仍然存在一些缺点(尽管人们仍然倾向于相信总体而言最好坚持后者)。一个这样的例子是矩阵幂: (3,1) 是矩阵的正确三次矩阵幂(而它是 ndarray 的元素立方)。不幸的是 mat ** 3 更加冗长。此外,就地矩阵乘法仅适用于矩阵类。相比之下,虽然 in numpy 1.10PEP 465 都允许 numpy.linalg.matrix_power 作为使用 matmul 的增强赋值,但从 numpy 1.15 开始,这并未为 ndarrays 实现。

    弃用历史

    考虑到上述 @= 类的复杂性,长期以来一直反复讨论其可能的弃用。引入 matrix 中缀运算符,这是此过程 python grammar 的重要先决条件。不幸的是,早期矩阵类的优势意味着它的使用范围很广。有一些依赖于矩阵类的库(最重要的依赖之一是 happened in September 2015,它同时使用 @ 语义,并且在增密时经常返回矩阵),因此完全弃用它们总是有问题的。

    已经在 scipy.sparse 中,我发现了诸如

    numpy was designed for general purpose computational needs, not any one branch of math. nd-arrays are very useful for lots of things. In contrast, Matlab, for instance, was originally designed to be an easy front-end to linear algebra package. Personally, when I used Matlab, I found that very awkward -- I was usually writing 100s of lines of code that had nothing to do with linear algebra, for every few lines that actually did matrix math. So I much prefer numpy's way -- the linear algebra lines of code are longer an more awkward, but the rest is much better.

    The Matrix class is the exception to this: is was written to provide a natural way to express linear algebra. However, things get a bit tricky when you mix matrices and arrays, and even when sticking with matrices there are confusions and limitations -- how do you express a row vs a column vector? what do you get when you iterate over a matrix? etc.

    There has been a bunch of discussion about these issues, a lot of good ideas, a little bit of consensus about how to improve it, but no one with the skill to do it has enough motivation to do it.



    这些反射(reflect)了矩阵类带来的好处和困难。我能找到的最早弃用建议是 a numpy mailing list thread from 2009 ,尽管部分原因是自那时以来发生了变化的不直观行为(特别是,切片和迭代矩阵将导致最有可能期望的(行)矩阵)。该建议表明,这是一个极具争议的主题,而且矩阵乘法的中缀运算符至关重要。

    下一次提到我可以找到 from 2008,结果证明这是一个非常富有成效的线程。随后的讨论提出了一般处理 numpy 子类 is from 2014 的问题。还有 which general theme is still very much on the table :

    What sparked this discussion (on Github) is that it is not possible to write duck-typed code that works correctly for:

    • ndarrays
    • matrices
    • scipy.sparse sparse matrixes

    The semantics of all three are different; scipy.sparse is somewhere between matrices and ndarrays with some things working randomly like matrices and others not.

    With some hyberbole added, one could say that from the developer point of view, np.matrix is doing and has already done evil just by existing, by messing up the unstated rules of ndarray semantics in Python.



    随后对矩阵可能的 future 进行了许多有值(value)的讨论。即使当时没有 numpy.matrix 运算符,也有很多人考虑过矩阵类的弃用以及它如何影响下游用户。据我所知,这个讨论直接导致了引入 matmul 的 PEP 465 的诞生。

    strong criticism:

    In my opinion, a "fixed" version of np.matrix should (1) not be a np.ndarray subclass and (2) exist in a third party library not numpy itself.

    I don't think it's really feasible to fix np.matrix in its current state as an ndarray subclass, but even a fixed matrix class doesn't really belong in numpy itself, which has too long release cycles and compatibility guarantees for experimentation -- not to mention that the mere existence of the matrix class in numpy leads new users astray.



    一旦 @ 运算符可用一段时间后, In early 2015the discussion of deprecation surfaced again 关于矩阵弃用和 @ 的关系。

    最终, reraising the topic 。关于类(class)的家属:

    How would the community handle the scipy.sparse matrix subclasses? These are still in common use.



    他们很长一段时间都不会去任何地方(直到稀疏的 ndarrays
    至少实现)。因此 np.matrix 需要移动,而不是删除。

    ( first action to deprecate scipy.sparse was taken in late November 2017 ) 和

    while I want to get rid of np.matrix as much as anyone, doing that anytime soon would be really disruptive.

    • There are tons of little scripts out there written by people who didn't know better; we do want them to learn not to use np.matrix but breaking all their scripts is a painful way to do that

    • There are major projects like scikit-learn that simply have no alternative to using np.matrix, because of scipy.sparse.

    So I think the way forward is something like:

    • Now or whenever someone gets together a PR: issue a PendingDeprecationWarning in np.matrix.__init__ (unless it kills performance for scikit-learn and friends), and put a big warning box at the top of the docs. The idea here is to not actually break anyone's code, but start to get out the message that we definitely don't think anyone should use this if they have any alternative.

    • After there's an alternative to scipy.sparse: ramp up the warnings, possibly all the way to FutureWarning so that existing scripts don't break but they do get noisy warnings

    • Eventually, if we think it will reduce maintenance costs: split it into a subpackage



    ( source )。

    现状

    截至 2018 年 5 月(numpy 1.15,相关 sourcepull request), commit 包含以下注释:

    It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.



    同时 numpy.matrix 已添加到 PendingDeprecationWarning 。不幸的是, matrix class docstring ,所以大多数 numpy 的最终用户不会看到这个强烈的提示。

    最后,截至 2018 年 11 月的 deprecation warnings are (almost always) silenced by default 提到了多个相关主题,作为“任务和功能 [numpy 社区] 将投入资源”之一:

    Some things inside NumPy do not actually match the Scope of NumPy.

    • A backend system for numpy.fft (so that e.g. fft-mkl doesn’t need to monkeypatch numpy)
    • Rewrite masked arrays to not be a ndarray subclass – maybe in a separate project?
    • MaskedArray as a duck-array type, and/or
    • dtypes that support missing values
    • Write a strategy on how to deal with overlap between numpy and scipy for linalg and fft (and implement it).
    • Deprecate np.matrix


    只要较大的库/许多用户(特别是 matrix.__new__ )依赖矩阵类,这种状态就可能会保持下去。但是,有 the numpy roadmap 移动 scipy.sparse 以依赖其他东西,例如 ongoing discussion 。不管弃用过程的发展如何,用户都应该在新代码中使用 scipy.sparse 类,如果可能,最好移植旧代码。最终,矩阵类可能会在一个单独的包中结束,以消除由于它以当前形式存在而造成的一些负担。

    关于python - NumPy 矩阵类的弃用状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53254738/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com