python - NumPy 矩阵类的弃用状态-6ren

python - NumPy 矩阵类的弃用状态

转载作者：行者123 更新时间：2023-12-02 07:55:06

NumPy 中 matrix 类的状态是什么？

我一直被告知我应该改用 ndarray 类。在我编写的新代码中使用 matrix 类是否值得/安全？我不明白为什么我应该使用 ndarray 来代替。

最佳答案

tl；博士: numpy.matrix 类正在被弃用。有一些引人注目的库依赖于该类作为依赖项(最大的一个是 scipy.sparse )，这阻碍了该类的适当短期弃用，但强烈建议用户使用 ndarray 类(通常使用 0x251341124函数)代替。随着用于矩阵乘法的 numpy.array 运算符的引入，矩阵的许多相对优势已被消除。

为什么(不是)矩阵类？
@ 是 numpy.matrix 的子类。它最初是为了方便在涉及线性代数的计算中使用，但与更一般的数组类的实例相比，它们的行为方式既有局限性，也有惊人的差异。行为根本差异的示例:

形状:数组可以具有从 0 到无穷大(或 32)的任意维数。矩阵总是二维的。奇怪的是，虽然无法创建具有更多维度的矩阵，但可以将单个维度注入(inject)到矩阵中，最终得到一个多维矩阵:numpy.ndarray(并不是说这具有任何实际重要性)。

索引:索引数组可以根据 np.matrix(np.random.rand(2,3))[None,...,None].shape == (1,2,3,1) 为您提供任何大小的数组。矩阵上的索引表达式总是会给你一个矩阵。这意味着两个arr[:,0]和arr[0,:]用于2D阵列给你一个1D ndarray，而mat[:,0]具有形状(N,1)和mat[0,:]具有形状(1,M)在matrix的情况下。

算术运算:过去使用矩阵的主要原因是对矩阵的算术运算(特别是乘法和幂)执行矩阵运算(矩阵乘法和矩阵幂)。数组的结果是元素乘法和幂。因此 mat1 * mat2 如果 mat1.shape[1] == mat2.shape[0] 有效，但 arr1 * arr2 有效，如果 arr1.shape == arr2.shape (当然结果意味着完全不同的东西)。此外，令人惊讶的是，mat1 / mat2 执行两个矩阵的元素除法。这种行为可能是从 ndarray 继承的，但对矩阵没有意义，特别是考虑到 * 的含义。

特殊属性:mat.A和mat.A1都与相同的值分别np.array(mat)和np.array(mat).ravel()，阵列的观点:矩阵除了什么阵列具有有how you are indexing it。 mat.T和mat.H是矩阵的转置和共轭转置(伴随)； arr.T 是 ndarray 类中唯一存在的此类属性。最后， mat.I 是 mat 的逆矩阵。

编写适用于 ndarrays 或矩阵的代码很容易。但是当两个类有可能在代码中交互时，事情就开始变得困难了。特别是，许多代码可以自然地用于 ndarray 的子类，但 matrix 是一个行为不良的子类，它可以很容易地破坏试图依赖鸭子类型的代码。考虑以下使用形状为 (3,4) 的数组和矩阵的示例:

import numpy as np

shape = (3, 4)
arr = np.arange(np.prod(shape)).reshape(shape) # ndarray
mat = np.matrix(arr) # same data in a matrix
print((arr + mat).shape)           # (3, 4), makes sense
print((arr[0,:] + mat[0,:]).shape) # (1, 4), makes sense
print((arr[:,0] + mat[:,0]).shape) # (3, 3), surprising

根据我们切片的维度，添加两个对象的切片是灾难性的。当形状相同时，矩阵和数组的加法都是按元素进行的。上面的前两种情况很直观:我们添加两个数组(矩阵)，然后从每个数组(矩阵)中添加两行。最后一种情况确实令人惊讶:我们可能打算添加两列并最终得到一个矩阵。原因当然是 arr[:,0] 的形状为 (3,) 与形状 (1,3) 兼容，但 mat[:.0] 的形状为 0x231418。这两个是 a few handy attributes 一起形成 (3,1) 。

最后，当 broadcast 首次实现 the (3,3) matmul operator was introduced in python 3.5 时，矩阵类的最大优势(即简洁地制定涉及大量矩阵乘积的复杂矩阵表达式的可能性)被删除。比较一个简单的二次形式的计算:

v = np.random.rand(3); v_row = np.matrix(v)
arr = np.random.rand(3,3); mat = np.matrix(arr)

print(v.dot(arr.dot(v))) # pre-matmul style
# 0.713447037658556, yours will vary
print(v_row * mat * v_row.T) # pre-matmul matrix style
# [[0.71344704]]
print(v @ arr @ v) # matmul style
# 0.713447037658556

综上所述，很明显为什么矩阵类被广泛用于线性代数:中缀 @ 运算符使表达式变得不那么冗长且更易于阅读。但是，我们使用现代 python 和 numpy 获得了与 * 运算符相同的可读性。此外，请注意矩阵情况为我们提供了一个形状为 @ 的矩阵，它在技术上应该是一个标量。这也意味着我们不能将列向量与这个“标量”相乘:上例中的 (1,1) 会引发错误，因为形状为 (v_row * mat * v_row.T) * v_row.T 和 (1,1) 的矩阵不能按此顺序相乘。

为完整起见，应该注意的是，虽然 matmul 运算符修复了与矩阵相比 ndarrays 次优的最常见情况，但在使用 ndarrays 优雅地处理线性代数方面仍然存在一些缺点(尽管人们仍然倾向于相信总体而言最好坚持后者)。一个这样的例子是矩阵幂: (3,1) 是矩阵的正确三次矩阵幂(而它是 ndarray 的元素立方)。不幸的是 mat ** 3 更加冗长。此外，就地矩阵乘法仅适用于矩阵类。相比之下，虽然 in numpy 1.10 和 PEP 465 都允许 numpy.linalg.matrix_power 作为使用 matmul 的增强赋值，但从 numpy 1.15 开始，这并未为 ndarrays 实现。

弃用历史

考虑到上述 @= 类的复杂性，长期以来一直反复讨论其可能的弃用。引入 matrix 中缀运算符，这是此过程 python grammar 的重要先决条件。不幸的是，早期矩阵类的优势意味着它的使用范围很广。有一些依赖于矩阵类的库(最重要的依赖之一是 happened in September 2015，它同时使用 @ 语义，并且在增密时经常返回矩阵)，因此完全弃用它们总是有问题的。

已经在 scipy.sparse 中，我发现了诸如

numpy was designed for general purpose computational needs, not any one branch of math. nd-arrays are very useful for lots of things. In contrast, Matlab, for instance, was originally designed to be an easy front-end to linear algebra package. Personally, when I used Matlab, I found that very awkward -- I was usually writing 100s of lines of code that had nothing to do with linear algebra, for every few lines that actually did matrix math. So I much prefer numpy's way -- the linear algebra lines of code are longer an more awkward, but the rest is much better.

The Matrix class is the exception to this: is was written to provide a natural way to express linear algebra. However, things get a bit tricky when you mix matrices and arrays, and even when sticking with matrices there are confusions and limitations -- how do you express a row vs a column vector? what do you get when you iterate over a matrix? etc.

There has been a bunch of discussion about these issues, a lot of good ideas, a little bit of consensus about how to improve it, but no one with the skill to do it has enough motivation to do it.

这些反射(reflect)了矩阵类带来的好处和困难。我能找到的最早弃用建议是 a numpy mailing list thread from 2009 ，尽管部分原因是自那时以来发生了变化的不直观行为(特别是，切片和迭代矩阵将导致最有可能期望的(行)矩阵)。该建议表明，这是一个极具争议的主题，而且矩阵乘法的中缀运算符至关重要。

下一次提到我可以找到 from 2008，结果证明这是一个非常富有成效的线程。随后的讨论提出了一般处理 numpy 子类 is from 2014 的问题。还有 which general theme is still very much on the table :

What sparked this discussion (on Github) is that it is not possible to write duck-typed code that works correctly for:

ndarrays

matrices

scipy.sparse sparse matrixes

The semantics of all three are different; scipy.sparse is somewhere between matrices and ndarrays with some things working randomly like matrices and others not.

With some hyberbole added, one could say that from the developer point of view, np.matrix is doing and has already done evil just by existing, by messing up the unstated rules of ndarray semantics in Python.

随后对矩阵可能的 future 进行了许多有值(value)的讨论。即使当时没有 numpy.matrix 运算符，也有很多人考虑过矩阵类的弃用以及它如何影响下游用户。据我所知，这个讨论直接导致了引入 matmul 的 PEP 465 的诞生。

strong criticism:

In my opinion, a "fixed" version of np.matrix should (1) not be a np.ndarray subclass and (2) exist in a third party library not numpy itself.

I don't think it's really feasible to fix np.matrix in its current state as an ndarray subclass, but even a fixed matrix class doesn't really belong in numpy itself, which has too long release cycles and compatibility guarantees for experimentation -- not to mention that the mere existence of the matrix class in numpy leads new users astray.

一旦 @ 运算符可用一段时间后， In early 2015 、 the discussion of deprecation surfaced again 关于矩阵弃用和 @ 的关系。

最终， reraising the topic 。关于类(class)的家属:

How would the community handle the scipy.sparse matrix subclasses? These are still in common use.

他们很长一段时间都不会去任何地方(直到稀疏的 ndarrays
至少实现)。因此 np.matrix 需要移动，而不是删除。

( first action to deprecate scipy.sparse was taken in late November 2017 ) 和

while I want to get rid of np.matrix as much as anyone, doing that anytime soon would be really disruptive.

There are tons of little scripts out there written by people who didn't know better; we do want them to learn not to use np.matrix but breaking all their scripts is a painful way to do that

There are major projects like scikit-learn that simply have no alternative to using np.matrix, because of scipy.sparse.

So I think the way forward is something like:

Now or whenever someone gets together a PR: issue a PendingDeprecationWarning in np.matrix.__init__ (unless it kills performance for scikit-learn and friends), and put a big warning box at the top of the docs. The idea here is to not actually break anyone's code, but start to get out the message that we definitely don't think anyone should use this if they have any alternative.

After there's an alternative to scipy.sparse: ramp up the warnings, possibly all the way to FutureWarning so that existing scripts don't break but they do get noisy warnings

Eventually, if we think it will reduce maintenance costs: split it into a subpackage

( source )。

现状

截至 2018 年 5 月(numpy 1.15，相关 source 和 pull request)， commit 包含以下注释:

It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.

同时 numpy.matrix 已添加到 PendingDeprecationWarning 。不幸的是， matrix class docstring ，所以大多数 numpy 的最终用户不会看到这个强烈的提示。

最后，截至 2018 年 11 月的 deprecation warnings are (almost always) silenced by default 提到了多个相关主题，作为“任务和功能 [numpy 社区] 将投入资源”之一:

Some things inside NumPy do not actually match the Scope of NumPy.

A backend system for numpy.fft (so that e.g. fft-mkl doesn’t need to monkeypatch numpy)

Rewrite masked arrays to not be a ndarray subclass – maybe in a separate project?

MaskedArray as a duck-array type, and/or

dtypes that support missing values

Write a strategy on how to deal with overlap between numpy and scipy for linalg and fft (and implement it).

Deprecate np.matrix

只要较大的库/许多用户(特别是 matrix.__new__ )依赖矩阵类，这种状态就可能会保持下去。但是，有 the numpy roadmap 移动 scipy.sparse 以依赖其他东西，例如 ongoing discussion 。不管弃用过程的发展如何，用户都应该在新代码中使用 scipy.sparse 类，如果可能，最好移植旧代码。最终，矩阵类可能会在一个单独的包中结束，以消除由于它以当前形式存在而造成的一些负担。

关于python - NumPy 矩阵类的弃用状态，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53254738/

文章推荐： directory - NSIS - File/x 不排除声称的文件/目录

文章推荐：同一类的 java Graphics2d 实例的行为不同

文章推荐： java - Android 应用程序的简单基准测试

文章推荐： networking - 每个 IP 地址限制 1 票？

numpy - 检查一个 numpy 数组是否是一个 numpy 掩码数组
作为脚本的输出，我有 numpy masked array和标准numpy array .如何在运行脚本时轻松检查数组是否为掩码(具有 data 、 mask 属性)？最佳答案您可以通过 isin
python - 检查一个 numpy 数组中有多少个 numpy 数组与另一个不同大小的 numpy 数组中的其他 numpy 数组相等
我的问题假设我有 a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.arra
numpy - Numpy 是否具有内置的元素矩阵模幂实现
numpy 是否有用于矩阵模幂运算的内置实现？ (正如 user2357112 所指出的，我实际上是在寻找元素明智的模块化减少) 对常规数字进行模幂运算的一种方法是使用平方求幂 (https://en
numpy - 向量化梯度下降 Numpy
我已经在 Numpy 中实现了这个梯度下降: def gradientDescent(X, y, theta, alpha, iterations): m = len(y) for i
numpy - 如何在不编译源代码的情况下安装 Numpy
我有一个使用 Numpy 在 CentOS7 上运行的项目。问题是安装此依赖项需要花费大量时间。因此，我尝试 yum install pip install 之前的 numpy 库它。所以我跑:
python - Numpy:用 numpy 数组替换 numpy 数组中的零
处理我想要旋转的数据。请注意，我仅限于 numpy，无法使用 pandas。原始数据如下所示: data = [ [ 1, a, [, ] ], [ 1, b, [, ] ], [ 2,
numpy - numpy 中的随机数种子
numpy.random.seed(7) 在不同的机器学习和数据分析教程中，我看到这个种子集有不同的数字。选择特定的种子编号真的有区别吗？或者任何数字都可以吗？选择种子数的目标是相同实验的可重复性。
numpy - numpy 数组的内存映射文件
我需要读取存储在内存映射文件中的巨大 numpy 数组的部分内容，处理数据并对数组的另一部分重复。整个 numpy 数组占用大约 50 GB，我的机器有 8 GB RAM。我最初使用 numpy.m
python - Numpy:用 numpy 数组替换 numpy 数组中的零
处理我想要旋转的数据。请注意，我仅限于 numpy，无法使用 pandas。原始数据如下所示: data = [ [ 1, a, [, ] ], [ 1, b, [, ] ], [ 2,
numpy - numpy.empty() 优于 numpy.ndarray() 的目的是什么？
似乎 numpy.empty() 可以做的任何事情都可以使用 numpy.ndarray() 轻松完成，例如: >>> np.empty(shape=(2, 2), dtype=np.dtype('d
numpy - numpy 数组中标记组件之间的最小边到边欧氏距离
我在大型 numpy 数组中有许多不同的形式，我想使用 numpy 和 scipy 计算它们之间的边到边欧氏距离。注意:我进行了搜索，这与堆栈中之前的其他问题不同，因为我想获得数组中标记 block
python - numpy 数组的 numpy 数组 numpy 数组的
我有一个大小为 (2x3) 的 numpy 对象数组。我们称之为M1。在M1中有6个numpy数组。M1 给定行中的数组形状相同，但与 M1 任何其他行中的数组形状不同。也就是说， M1 = [ [
numpy - numpy 点积的爱因斯坦符号
如何使用爱因斯坦表示法编写以下点积？ import numpy as np LHS = np.ones((5,20,2)) RHS = np.ones((20,2)) np.sum([ np.
python - 如何仅使用 numpy 操作根据其他两个 numpy 数组的条件获取新的 numpy 数组？
假设我有 np.array of a = [0, 1, 1, 0, 0, 1] 和 b = [1, 1, 0, 0, 0, 1] 我想要一个新矩阵 c 使得如果 a[i] = 0 和 b[i] = 0
python - Numpy:在另一个 numpy 数组中创建一批 numpy 数组( reshape )
我有一个形状为 (32,5) 的 numpy 数组 batch。批处理的每个元素都包含一个 numpy 数组 batch_elem = [s,_,_,_,_] 其中 s = [img,val1,val
python - 无法将 NumPy 数组转换为张量(不支持的对象类型 numpy.ndarray)- 已经将数据转换为 numpy 数组
尝试为基于文本的多标签分类问题训练单层神经网络。 model= Sequential() model.add(Dense(20, input_dim=400, kernel_initializer='
python - 从 2D numpy 数组的 numpy 数组高效创建 block numpy 数组
首先是一个简单的例子 import numpy as np a = np.ones((2,2)) b = 2*np.ones((2,2)) c = 3*np.ones((2,2)) d = 4*np.
python - 使用 numpy.mean 或 numpy.average 平均二维 numpy.array
我正在尝试平均二维 numpy 数组。所以，我使用了 numpy.mean 但结果是空数组。 import numpy as np ws1 = np.array(ws1) ws1_I8 = np.ar
python - 基于 2D numpy 索引数组排列 numpy 2D 数组的 numpy 方式是什么？
import numpy as np x = np.array([[1,2 ,3], [9,8,7]]) y = np.array([[2,1 ,0], [1,0,2]]) x[y] 预期输出: ar
numpy - Python numpy 矩阵乘法与一个对角矩阵
我有两个数组 A (4000,4000)，其中只有对角线填充了数据，而 B (4000,5) 填充了数据。有没有比 numpy.dot(a,b) 函数更快的方法来乘(点)这些数组？到目前为止，我发现

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - NumPy 矩阵类的弃用状态