Python numpy : linalg. pinv() 太不精确-6ren

Python numpy : linalg. pinv() 太不精确

转载作者：行者123 更新时间：2023-12-01 01:53:01

我一直在与 numpy 合作最近在算法中使用矩阵，遇到了一个问题:

我总共使用了3个矩阵。

m1 = [[  3   2   2 ...   3   2   3]
      [  3   3   3 ...   2   2   2]
      [500 501 502 ... 625 626 627]
      ...
      [623 624 625 ... 748 749 750]
      [624 625 626 ... 749 750 751]
      [625 626 627 ... 750 751 752]]

m1是 (128,128) 奇异方阵。前两行看似随机的 2 和 3 序列。接下来的行通过算法从 500 开始填充，从第三行第一列开始为每行和每列添加一个。

m2 = [[  2   3 500 ... 623 624 625]
      [  2   2 500 ... 623 624 625]
      [  3   2 500 ... 623 624 625]
      ...
      [  2   3 500 ... 623 624 625]
      [  2   2 500 ... 623 624 625]
      [  3   2 500 ... 623 624 625]]

m2也是 (128,128) 奇异方阵。这次，随机序列归因于前两列。每行的其余部分填充有 500、501、502、503 等。

m3 = [[     790      784   157500 ...   196245   196560   196875]
      [     804      811   161000 ...   200606   200928   201250]
      [  180501   180411 36064000 ... 44935744 45007872 45080000]
      ...
      [  219861   219771 43936000 ... 54744256 54832128 54920000]
      [  220181   220091 44000000 ... 54824000 54912000 55000000]
      [  220501   220411 44064000 ... 54903744 54991872 55080000]]

m3 = m1*m2

所以我想做的是恢复m2使用m1和m3 。理论上，我所要做的就是执行以下代码 m2 = (m1**-1)*m3 。不幸的是，由于m1作为奇异矩阵，无法计算其逆矩阵，即使可以计算，矩阵也太大，导致大量数值不精确。

相反，我决定使用 Moore-Penrose Inverse的m1 ，它不要求矩阵是非奇异的，并且与逆矩阵类似，理论上可以恢复 m2 ，使用np.linalg.pinv(m1) * m3 .

我再次使用“理论上”这个术语，因为事实证明，numpy当涉及到大矩阵的此类计算时，它太不精确了，这是我获得的 m2 的结果:

[[  2.46207616   2.48959603 500.         ... 623.         624.
  625.        ]
 [  2.38612549   2.61197086 500.         ... 623.         624.
  625.        ]
 [  2.38711085   2.6125801  500.         ... 623.         624.
  625.        ]
 ...
 [  2.61998539   2.37184747 500.         ... 623.         624.
  625.        ]
 [  2.54403472   2.4942223  500.         ... 623.         624.
  625.        ]
 [  2.62195611   2.37306595 500.         ... 623.         624.
  625.        ]]

如您所见，m1 的整个“填充”部分计算正确，没有问题。然而，前两列似乎有问题，将数字四舍五入到 2 和 3 给了我一个不正确的 m2 .

我正在寻找一种方法来制作 np.linalg.pinv()方法的浮点计算更加精确，因此它可以获得序列的正确值，因为这些值非常重要。

通过一些研究，我了解到 np.linalg.pinv()有一个名为 rcond 的参数，描述如下:

rcond : (…) array_like of float

Cutoff for small singular values. Singular values smaller (in modulus) than rcond * largest_singular_value (again, in modulus) are set to zero. Broadcasts against the stack of matrices

rcond默认情况下，设置为 1e-15 。我认为进一步减少这个数字可能有助于消除不精确性。 1e-16还不够，从 1e-17 开始，我得到非常奇怪的值，例如:

[[ 3.000e+00  3.000e+00  5.000e+02 ...  6.230e+02  6.240e+02  6.250e+02]
 [ 1.100e+01  4.000e+00  1.840e+02 ...  1.722e+03  2.032e+03  1.831e+03]
 [-3.000e+00 -5.000e+00 -4.030e+02 ... -1.232e+03 -7.400e+02 -1.272e+03]
 ...
 [ 1.100e+01  1.200e+01  2.164e+03 ...  4.030e+03  4.872e+03  1.873e+03]
 [-1.200e+01 -9.000e+00 -1.618e+03 ... -3.240e+03 -2.519e+03 -4.167e+03]
 [ 2.000e+01  2.600e+01  4.535e+03 ...  5.165e+03  5.881e+03  5.189e+03]]

所以，基本上，我陷入困境，我不知道如何提高精度。最糟糕的是，我有一个可以显着提高浮点精度的模块，它叫做 mpmath并且还有似乎与我的算法配合得更好的矩阵，如 numpy那些。但是mpmath没有计算伪逆的方法，并且 numpy不会将其自身的浮点精度调整为 mpmath 设置的值.

您有什么建议吗？我可以尝试获得正确的m2使用伪逆方法？

最佳答案

您的麻烦与 pinv 准确与否无关。

正如您所注意到的，您的矩阵严重缺乏秩，m1 的秩为 4 或更低，m2 的秩为 3 或更低。因此，您的系统 m1@x = m3 处于极端不确定状态，并且无法恢复 m2。

即使我们把我们所知道的关于 m2 结构的知识都投入其中，即前两列 3 和 2，其余 500 向上计数，也有大量的组合解决方案。

如果有足够的时间，下面的脚本会找到所有这些内容。在实践中，我没有考虑超出 32x32 矩阵的情况，在下面所示的运行中产生了15093381006 个不同的有效重建 m2' 满足 m1@m2' = m3 和我刚才提到的结构约束。

import numpy as np
import itertools

def make(n, offset=500):
    offset -= 2
    res1 = np.add.outer(np.arange(n), np.arange(offset, offset+n))
    res1[:2] = np.random.randint(2, 4, (2, n))
    res2 = np.add.outer(np.zeros(n, int), np.arange(offset, offset+n))
    res2[:, :2] = np.random.randint(2, 4, (n, 2))
    return res1, res2

def subsets(L, n, mn, mx, prepend=[]):
    if n == 0:
        if mx >= mn:
            yield prepend
    elif n == 1:
        for l in L[L.searchsorted(mn):L.searchsorted(mx, 'right')]:
            yield prepend + [l]
    else:
        ps = L.cumsum()
        ps[n:] -= ps[:-n]
        ps = ps[n-1:]
        for i in range(L.searchsorted(mn - np.sum(L[len(L)-n+1:])),
                       ps.searchsorted(mx, 'right')):
            yield from subsets(L[i+1:], n-1, mn - L[i], mx - L[i],
                               prepend = prepend + [L[i]])

def solve(m1, m3, ci=0, offset=500):
    n, n = m1.shape
    col = m3.T[ci]
    n3s = col[3] - col[2] - 2 * n
    six = col[2] - offset * (col[3] - col[2]) - n * (n-1)
    idx = np.lexsort(m1[:2])
    m1s = m1[:2, idx]
    sm = m1s[1].searchsorted(2.5)
    sl = m1s[0, :sm].searchsorted(2.5)
    sr = sm + m1s[0, sm:].searchsorted(2.5)
    n30 = n - sl - sr + sm
    n31 = n - sm
    n330 = col[0] - 4*n - 2*n3s - 2*n30
    n331 = col[1] - 4*n - 2*n3s - 2*n31
    for n333 in range(max(0, n330 - sm + sl, n331 - sr + sm),
                      min(n - sr, n330, n331) + 1):
        n332 = n330 - n333
        n323 = n331 - n333
        n322 = n3s - n332 - n323 - n333
        mx333 = six - idx[sl:sl+n332].sum() - idx[sm:sm+n323].sum() \
                - idx[:n322].sum()
        mn333 = six - idx[sm-n332:sm].sum() - idx[sr-n323:sr].sum() \
                - idx[sl-n322:sl].sum()
        for L333 in subsets(idx[sr:], n333, mn333, mx333):
            mx332 = six - np.sum(L333) - idx[sm:sm+n323].sum() \
                - idx[:n322].sum()
            mn332 = six - np.sum(L333) - idx[sr-n323:sr].sum() \
                - idx[sl-n322:sl].sum()
            for L332 in subsets(idx[sl:sm], n332, mn332, mx332,
                                prepend=L333):
                mx323 = six - np.sum(L332) - idx[:n322].sum()
                mn323 = six - np.sum(L332) - idx[sl-n322:sl].sum()
                for L323 in subsets(idx[sm:sr], n323, mn323, mx323,
                                    prepend=L332):
                    ex322 = six - np.sum(L323)
                    yield from subsets(idx[:sl], n322, ex322, ex322,
                                       prepend=L323)

def recon(m1, m3, ci=0, offset=500):
    n, n = m1.shape
    nsol = nfp = 0
    REC = []
    for i3s in solve(m1, m3, ci, offset):
        rec = np.full(n, 2)
        rec[i3s] = 3
        if not np.all(m3.T[ci] == m1@rec):
            print('!', rec, m3.T[ci], m1@rec)
            nfp += 1
        else:
            nsol += 1
            REC.append(rec)
    print('col', ci, ':',  nsol, 'solutions,', nfp, 'false positives')
    return np.array(REC)

def full_recon(m1, m3, offset=500, subsample=10):
    n, n = m1.shape
    col0, col1 = (recon(m1, m3, i, offset) for i in (0, 1))
    yield col0.shape[0], col1.shape[0]
    if not subsample is None:
        col0, col1 = (col[np.random.choice(col.shape[0], subsample)]
                      if col.shape[0] > subsample else col
                      for col in (col0, col1))
    print('col 0', col0)
    print('col 1', col1)
    for c0, c1 in itertools.product(col0, col1):
        out = np.add.outer(np.zeros(n, int), np.arange(offset-2, offset+n-2))
        out[:, :2] = np.c_[c0, c1]
        yield out

def check(m1, m3, offset=500, subsample=10):
    for cnt, m2recon in enumerate(full_recon(m1, m3, offset, subsample)):
        if cnt == 0:
            tot0, tot1 = m2recon
            continue
        assert np.all(m3 == m1@m2recon)
    print(cnt, 'solutions verified out of', tot0, 'x', tot1, '=', tot0 * tot1)

示例运行:

>>> m1, m2 = make(32)
>>> check(m1, m1@m2)
col 0 : 133342 solutions, 0 false positives
col 1 : 113193 solutions, 0 false positives
col 0 [[2 2 3 2 2 2 3 2 3 3 3 3 3 2 3 3 2 2 2 2 2 3 3 3 2 2 2 2 2 2 2 2]
 [2 2 3 2 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 3 3 3 2 2 2 3 2 3 2 2 2 2]
 [2 3 3 2 3 3 2 2 2 3 2 3 3 2 2 2 2 3 3 3 2 2 2 2 3 2 2 2 2 2 2 3]
 [2 2 3 3 2 2 3 3 3 2 2 3 2 3 3 3 2 2 2 2 2 2 2 3 2 3 3 2 2 2 2 2]
 [2 3 3 3 2 3 3 2 2 2 2 3 2 2 3 3 2 2 3 2 2 3 2 2 2 2 2 2 3 3 2 2]
 [3 2 2 2 3 2 3 3 3 2 3 2 2 2 2 3 3 3 2 2 2 2 3 3 2 3 2 2 2 2 2 2]
 [2 2 2 3 3 3 2 2 3 3 3 3 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2 2 3 2 2 2]
 [2 2 2 3 3 2 3 3 3 2 3 2 2 3 3 3 2 3 2 2 2 2 2 2 2 2 2 3 2 3 2 2]
 [3 2 3 3 2 2 3 2 2 3 2 3 3 2 2 2 3 2 3 2 2 2 2 3 2 3 2 2 3 2 2 2]
 [3 2 2 3 3 2 3 3 2 2 2 3 2 3 2 2 3 2 3 3 2 2 2 2 2 3 2 2 2 2 2 3]]
col 1 [[2 2 2 2 3 3 3 3 3 3 3 3 2 2 3 2 2 2 3 3 2 3 2 2 2 3 2 3 2 2 2 3]
 [2 3 3 2 3 3 2 2 2 3 3 3 2 2 3 3 2 3 2 3 2 2 3 2 2 2 2 3 3 2 2 3]
 [3 2 3 2 2 3 3 2 2 3 3 3 2 2 3 2 3 2 3 2 3 2 3 3 2 2 2 2 2 3 3 2]
 [2 2 3 2 3 3 3 2 3 3 2 3 2 3 2 3 2 3 2 2 3 2 3 2 2 3 2 3 2 2 2 3]
 [3 3 2 2 2 2 3 2 3 3 3 3 2 3 3 2 2 3 3 2 2 2 2 2 3 2 2 3 3 3 2 2]
 [3 3 2 3 2 2 2 3 3 3 2 2 2 3 2 2 2 3 3 3 3 2 3 3 2 2 2 3 3 2 2 2]
 [2 3 2 3 2 2 3 3 2 3 3 3 2 3 2 2 3 3 2 3 2 2 3 2 3 2 2 3 2 2 3 2]
 [2 3 2 3 2 3 3 3 3 3 3 2 2 2 2 2 3 2 3 2 2 2 3 2 2 3 3 2 3 2 2 3]
 [3 2 3 2 2 3 3 3 2 2 3 3 2 2 3 3 2 3 2 2 3 2 2 3 2 2 3 2 3 2 2 3]
 [3 2 3 3 2 3 2 2 2 3 3 3 2 3 3 2 2 2 3 2 3 2 2 3 2 2 3 2 2 2 3 3]]
100 solutions verified out of 133342 x 113193 = 15093381006

关于Python numpy : linalg. pinv() 太不精确，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50546710/

文章推荐： python - 使用预训练的词嵌入对 "pools"的词进行分类

文章推荐： python - 让点跟随更大的点

文章推荐： python - FramelessWindowHint 和 WindowStaysOnTopHint 不起作用

文章推荐： php - CakePHP:在 View 和元素中嵌入 CTP 文件名和路径

python - Python 中的集群或合并集群以减少组数 (Python)
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库，但没有成功。我猜它只是通过 knn 聚类
python - python 列表的子集基于同一列表的元素组，pythonically
我有一个扁平数字列表，这些数字逻辑上以 3 为一组，其中每个三元组是 (number, __ignored, flag[0 or 1])，例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
python - 激活 Python 虚拟环境并在另一个 Python 脚本中调用 Python 脚本
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
python - 在焕然一新的 Python 环境中以编程方式从 Python 内部执行 Python 文件
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
python - 从 python 脚本但在 python 脚本之外运行 python 脚本
这听起来像是谜语或笑话，但实际上我还没有找到这个问题的答案。问题到底是什么？我想运行 2 个脚本。在第一个脚本中，我调用另一个脚本，但我希望它们继续并行，而不是在两个单独的线程中。主要是我不希望第
python - 使用不同的 python 从 python 运行 python 脚本
我有一个带有 python 2.5.5 的软件。我想发送一个命令，该命令将在 python 2.7.5 中启动一个脚本，然后继续执行该脚本。我试过用 #!python2.7.5 和http://re
python - 为什么从 Python 命令行调用 Python 时 Python 无法找到并运行我的脚本？
我在 python 命令行(使用 python 2.7)中，并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹，使用: os.chdir("
python - 使用动态版本的 Python 执行嵌入的 Python 代码时出现致命的 Python 错误
剧透:部分解决(见最后)。以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
python - python 中识别 python 数组或列表中最大累积差异的最快方法是什么？
假设我有以下列表，对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
python - (Python) 通过单选按钮 python 更新背景
所以我试图在选择某个单选按钮时更改此框架的背景。我的框架位于一个类中，并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
python - python 中的字符串与正则表达式比较在 python 中失败
我正在尝试将字符串与 python 中的正则表达式进行比较，如下所示， #!/usr/bin/env python3 import re str1 = "Expecting property name
python - python 如何加载Boost.Python 库？
考虑以下原型(prototype) Boost.Python 模块，该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
python - python 检查模块 python 的问题
如何编写一个程序来“识别函数调用的行号？” python 检查模块提供了定位行号的选项，但是， def di(): return inspect.currentframe().f_back.f_l
python - 系统 python 与用户 python
我已经使用 macports 安装了 Python 2.7，并且由于我的 $PATH 变量，这就是我输入 $ python 时得到的变量。然而，virtualenv 默认使用 Python 2.6，除
python - [Python] : Python re. 长字符串行的搜索速度优化
我只想问如何加快 python 上的 re.search 速度。我有一个很长的字符串行，长度为 176861(即带有一些符号的字母数字字符)，我使用此函数测试了该行以进行研究: def getExe
python - 编辑字符串 python 正则表达式 python
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
python - Python 映射中的副作用(Python "do" block )
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。告
python - 使用其值逻辑组合两个 python 列表 - Python
我想用 Python 将两个列表组合成一个列表，方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
python - Boost.Python python 链接错误
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
python - 在 Python 中仅使用内置库制作一个基本的网络抓取工具 - Python
学习 Python，我正在尝试制作一个没有任何第 3 方库的网络抓取工具，这样过程对我来说并没有简化，而且我知道我在做什么。我浏览了一些在线资源，但所有这些都让我对某些事情感到困惑。 html 看起来

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Python numpy : linalg. pinv() 太不精确