python - 使用 bool 数组在 DataFrame(或 ndarray)中有效设置一维范围值-6ren

python - 使用 bool 数组在 DataFrame(或 ndarray)中有效设置一维范围值

转载作者：行者123 更新时间：2023-12-01 09:10:30

先决条件

import numpy as np
import pandas as pd

INPUT1: bool 二维数组(示例数组如下)

x = np.array(
    [[False,False,False,False,True],
     [True,False,False,False,False],
     [False,False,True,False,True],
     [False,True,True,False,False],
     [False,False,False,False,False]])

INPUT2:1D 范围值(示例如下)

y=np.array([1,2,3,4])

预期输出:2D ndarray

   [[0,0,0,0,1],
    [1,0,0,0,2],
    [2,0,1,0,1],
    [3,1,1,0,2],
    [4,2,2,0,3]]

我想有效地为 2d ndarray(INPUT1) 中的每个 True 设置一个范围值(垂直向量)。是否有一些有用的 API 或解决方案可用于此目的？

最佳答案

不幸的是我无法想出一个优雅的解决方案，所以我想出了多个不优雅的解决方案。我能想到的两种主要方法是

对每个 True 值进行强力循环并分配切片，以及
使用单个索引赋值来替换必要的值。

事实证明，这些方法的时间复杂度并不简单，因此根据数组的大小，任何一种方法都可以更快。

使用您的示例输入:

import numpy as np

x = np.array(
    [[False,False,False,False,True],
     [True,False,False,False,False],
     [False,False,True,False,True],
     [False,True,True,False,False],
     [False,False,False,False,False]])
y = np.array([1,2,3,4])
refout = np.array([[0,0,0,0,1],
    [1,0,0,0,2],
    [2,0,1,0,1],
    [3,1,1,0,2],
    [4,2,2,0,3]])

# alternative input with arbitrary size:
# N = 100; x = np.random.rand(N,N) < 0.2; y = np.arange(1,N)

def looping_clip(x, y):
    """Loop over Trues, use clipped slices"""
    nmax = x.shape[0]
    n = y.size

    # initialize output
    out = np.zeros_like(x, dtype=y.dtype)
    # loop over True values
    for i,j in zip(*x.nonzero()):
        # truncate right-hand side where necessary
        out[i:i+n, j] = y[:nmax-i]
    return out

def looping_expand(x, y):
    """Loop over Trues, use an expanded buffer"""
    n = y.size
    nmax,mmax = x.shape
    ivals,jvals = x.nonzero()

    # initialize buffed-up output
    out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)
    # loop over True values
    for i,j in zip(ivals, jvals):
        # slice will always be complete, i.e. of length y.size
        out[i:i+n, j] = y
    return out[:nmax, :].copy() # rather not return a view to an auxiliary array

def index_2d(x, y):
    """Assign directly with 2d indices, use an expanded buffer"""
    n = y.size
    nmax,mmax = x.shape
    ivals,jvals = x.nonzero()

    # initialize buffed-up output
    out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)

    # now we can safely index for each "(ivals:ivals+n, jvals)" so to speak
    upped_ivals = ivals[:,None] + np.arange(n) # shape (ntrues, n)
    upped_jvals = jvals.repeat(y.size).reshape(-1, n) # shape (ntrues, n)

    out[upped_ivals, upped_jvals] = y # right-hand size of shape (n,) broadcasts

    return out[:nmax, :].copy() # rather not return a view to an auxiliary array

def index_1d(x,y):
    """Assign using linear indices, use an expanded buffer"""
    n = y.size
    nmax,mmax = x.shape
    ivals,jvals = x.nonzero()

    # initialize buffed-up output
    out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)

    # grab linear indices corresponding to Trues in a buffed-up array
    inds = np.ravel_multi_index((ivals, jvals), out.shape)

    # now all we need to do is start stepping along rows for each item and assign y
    upped_inds = inds[:,None] + mmax*np.arange(n) # shape (ntrues, n)

    out.flat[upped_inds] = y  # y of shape (n,) broadcasts to (ntrues, n)

    return out[:nmax, :].copy() # rather not return a view to an auxiliary array


# check that the results are correct
print(all([np.array_equal(refout, looping_clip(x,y)),
           np.array_equal(refout, looping_expand(x,y)),
           np.array_equal(refout, index_2d(x,y)),
           np.array_equal(refout, index_1d(x,y))]))

我尝试记录每个函数，但这里有一个概要:

looping_clip 循环输入中的每个 True 值，并分配给输出中相应的切片。当切片的一部分沿第一维超出数组边缘时，我们会在右侧注意缩短分配的数组。
looping_expand 循环输入中的每个 True 值，并在分配填充的输出数组后分配给输出中相应的 full 切片，确保让每一片都充满。当分配更大的输出数组时，我们会做更多的工作，但我们不必在赋值时缩短右侧。我们可以在最后一步中省略 .copy() 调用，但我不喜欢返回一个非平凡的跨步数组(即辅助数组的 View 而不是正确的副本)，因为这可能会导致给用户带来不为人知的惊喜。
index_2d 计算要分配的每个值的 2d 索引，并假设重复索引将按顺序处理。这是无法保证的! (稍后会详细介绍。)
index_1d 使用线性化索引并索引到输出的 flatiter 中，执行相同的操作。

以下是使用随机数组的上述方法的计时(请参阅开头附近的注释行):

我们可以看到，对于小型和大型数组，循环版本更快，但对于大约 10 到 150 之间的线性大小，索引版本更好。我没有采用更高尺寸的原因是索引案例开始使用大量内存，而且我不想担心这种时间困惑。

为了使上述情况变得更糟，请注意，索引版本假定按顺序处理奇特索引场景中的重复索引，因此当处理 True 值时，这些值在数组中“较低” ，以前的值将根据您的要求被覆盖。只有一个问题:this is not guaranteed :

For advanced assignments, there is in general no guarantee for the iteration order. This means that if an element is set more than once, it is not possible to predict the final result.

这听起来不太令人鼓舞。虽然在我的实验中，索引似乎是按顺序处理的(根据 C 顺序)，但这也可能是巧合，或者是实现细节。因此，如果您想使用索引版本，请确保在您的特定版本以及特定尺寸和形状上这仍然适用。

我们可以通过自己删除重复索引来使分配更安全。为此，我们可以利用this answer by Divakar关于相应的问题:

def index_1d_safe(x,y):
    """Same as index_1d but use Divakar's safe solution for reducing duplicates"""
    n = y.size
    nmax,mmax = x.shape
    ivals,jvals = x.nonzero()

    # initialize buffed-up output
    out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)

    # grab linear indices corresponding to Trues in a buffed-up array
    inds = np.ravel_multi_index((ivals, jvals), out.shape)

    # now all we need to do is start stepping along rows for each item and assign y
    upped_inds = inds[:,None] + mmax*np.arange(n) # shape (ntrues, n)

    # now comes https://stackoverflow.com/a/44672126
    # need additional step: flatten upped_inds and corresponding y values for selection
    upped_flat_inds = upped_inds.ravel() # shape (ntrues, n) -> (ntrues*n,)
    y_vals = np.broadcast_to(y, upped_inds.shape).ravel() # shape (ntrues, n) -> (ntrues*n,)

    sidx = upped_flat_inds.argsort(kind='mergesort')
    sindex = upped_flat_inds[sidx]
    idx = sidx[np.r_[np.flatnonzero(sindex[1:] != sindex[:-1]), upped_flat_inds.size-1]]
    out.flat[upped_flat_inds[idx]] = y_vals[idx]

    return out[:nmax, :].copy() # rather not return a view to an auxiliary array

这仍然会重现您的预期输出。问题是现在该函数需要更长的时间才能完成:

真糟糕。考虑到我的索引版本仅对于中间数组大小而言更快，以及它们的更快版本如何不能保证工作，也许最简单的方法是仅使用其中一个循环版本。当然，这并不是说我没有错过任何最佳矢量化解决方案。

关于python - 使用 bool 数组在 DataFrame(或 ndarray)中有效设置一维范围值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51694663/

文章推荐： continuous-integration - 持续集成工具中的多个依赖项目

文章推荐： python - 如何在 LDA 中查看每个主题的所有文档？

文章推荐： python - Pandas 数据框搜索超过阈值的行

EXCEL 列/范围 A 的平均值如果列/范围 B 或列/范围 C 等于否
我不能解决这个问题。和标题说的差不多…… 如果其他两个范围/列中有“否”，我如何获得范围或列的平均值？换句话说，我想计算 A 列的平均值，并且我有两列询问是/否问题(B 列和 C 列)。我只希望 B
python - 2to3 范围(...) -> 列表(范围(...))
我知道 python 2to3 将所有 xrange 更改为 range 我没有发现任何问题。我的问题是关于它如何将 range(...) 更改为 list(range(...)) :它是愚蠢的，只是
java - session 范围 Bean 不是 session 范围 Bean
我有一个 Primefaces JSF 项目，并且我的 Bean 注释有以下内容: @Named("reportTabBean") @SessionScoped public class Report
ruby-on-rails-3 - 我可以在 Rails 中将常见的 ActiveRecord 范围(范围)与模块一起使用吗？
在 rails3 中，我在模型中制作了相同的范围。例如 class Common ?" , at) } end 我想将公共(public)范围拆分为 lib 中的模块。所以我试试这个。 module
jsf - 在另一个 View 范围 bean 中注入(inject)一个 View 范围 bean 会导致它被重新创建
我需要在另一个 View 范围 bean 中使用保存在 View 范围 bean 中的一些数据。 @ManagedBean @ViewScoped public class Attivita impl
JavaScript 范围
为什么下面的代码输出4？谁能给我推荐一篇好文章来深入学习 javascript 范围。这段代码返回4，但我不明白为什么？ (function f(){ return f(); functio
JavaScript 范围
我有一个与此结构类似的脚本 $(function(){ var someVariable; function doSomething(){ //here } $('#som
Jquery 范围
我刚刚开始学习 Jquery，但这些示例对我帮助不大...... 现在，以下代码发生的情况是，我有 4 个表单，我使用每个表单的链接在它们之间进行切换。但我不知道如何在第一个函数中获取变量“postO
JavaScript 范围/this
为什么当我这样做时: function Dog(){ this.firstName = 'scrappy'; } Dog.firstName 未定义？但是我可以这样做: Dog.firstNa
Python解析文本文件的选定区域/范围
我想打印文本文件 text.txt 的选定部分，其中包含: tickme 1.1(no.3) lesson1-bases lesson2-advancedfurther para:using the
Javascript "this"范围
我正在编写一些 JavaScript 代码。我对这个关键字有点困惑。如何在 dataReceivedHandler 函数中访问 logger 变量？ MyClass: { logger: nu
vba - 范围 - 更改引用
我有这个代码: Public Sub test() Dim Tgt As Range Set Tgt = Range("A1") End Sub 我想更改当前为“A1”的 Tgt 的引
regex - 范围> = 0但小于1000的正则表达式
我正忙于此工作，以为我会把它放在我们那里。该数字必须是最多3个单位和最多5个小数位的数字，等等。有效的 999.99999 99.9 9 0.99999 0 无效的 -0.1 999.123456
spring - 未注释的参数覆盖@???范围
覆盖代码时: @Override public void open(ExecutionContext executionContext) { super.open(executio
PHP preg_match 范围
我想使用 preg_match 来匹配数字 1 - 21。我如何使用 preg_match 来做到这一点？如果数字大于 21，我不想匹配任何东西。 example preg_match('([0-9]
具有特定开始和无限结束的 Clojure 范围
根据docs range函数有四种形式: (range) 0 - 无穷大 (range end) 0 - 结束 (range start end)开始 - 结束 (range start end st
iPhone 范围 slider
我知道有一个UISlider，但是有人已经制作了RangeSlider(用两个拇指吗？)或者知道如何扩展 uislider？最佳答案我认为你不能直接扩展 UISlider，你可能需要扩展 UICo
要列出的 Python 范围
我正在尝试将范围转换为列表。 nums = [] for x in range (9000, 9004): nums.append(x) print nums 输出 [9000] [9
typescript - 如何使用TypeScript方法装饰器并保留常规的 `this`范围
请注意:此问题是由于在运行我的修饰方法时使用了GraphQL解析器。这意味着this的范围为undefined。但是，该问题的基础知识对于装饰者遇到问题的任何人都是有用的。这是我想使用的基本装饰器(
JavaScript 范围/代码迭代不同步
我正在尝试创建一个工具来从网页上抓取信息(是的，我有权限)。到目前为止，我一直在使用 Node.js 结合 requests 和 Cheerio 来拉取页面，然后根据 CSS 选择器查找信息。我已经

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 使用 bool 数组在 DataFrame(或 ndarray)中有效设置一维范围值