Find blocks of numbers in 2D numpy array(在二维NumPy数组中查找数字块)

转载 作者:bug小助手 更新时间:2023-10-25 17:06:14
Start by building the following numpy array:


import numpy as np
import itertools as it

a = np.array([[j for j in range(1,5)] for i in range(3)])

table = np.array(list(it.product(*a)))

which looks something like,


[[1 1 1]
[1 1 2]
[1 1 3]

i.e the cartesian product of three copies of array [1,2,3,4]. Next if for any of the rows, columns 1 and 2 (not 3) contain a 3 I set all elements in the next columns to 3 (for example, if a row reads [3 1 1] it becomes [3 3 3]):

for row in table:
if len(np.where(row==3)[0])>0:

Next I have to identify blocks that contain 3s by storing four values: row,column where they start and row,column where they end. To do this I use yet another, unpythonic, for loop:

nr, nc = table.shape
blocks = []
bh = nr
for i in range(nc-1):
bh = (int)(bh/4)
for j in range((int)(nr/bh)):
temp = table[j*bh:(j+1)*bh,i:]
if temp[0,0]==3 and temp[-1,-1]==3:

and get,


[[[33, 1], [48, 3]], [[9, 2], [12, 3]], [[25, 2], [28, 3]], [[57, 2], [60, 3]]]

This is, obviously, a very inefficient way to use numpy. Can someone suggest a more pythonic way to do this?



regarding "column where they end" : for the block [9, 2] the ending block is [12, 2]. The same for block [25, 2] and others. Why point to 3?


The table is 64 x 3 ; the biggest 3 block is located at (33,1),(48,3) using 1...n indexing instead of 0...n-1 ; the next 3 blocks are located at (9,2),(12,3) , (25,2),(28,3) and (57,2),(60,3). This set of numbers is going to be used in another program to generate some graphics. Look at the routine used to generate blocks of 3s to understand why all 3 blocks end up in column 3

my note was not about your routine, but about your conceptually vague description


Apologies. I was just trying to get the coordinates of the rectangular blocks of 3s.



I couldn't get rid of all the loops, but at least removed the nested ones.


a = np.array([[j for j in range(1,5)] for i in range(3)])
table = np.array(list(it.product(*a)))

def mask_approach(table):
bool_mask = table == 3
rows_of_3s, columns_of_3s = np.where(bool_mask == True)

output = []
flag = False
stop_iter = len(rows_of_3s)-1
for i, v in enumerate(rows_of_3s):
if i == stop_iter:
if rows_of_3s[i+1] - v <= 1:
if flag == False:
start = (v, columns_of_3s[i])
flag = True
elif flag == True:
flag = False
output.append((start, (v, columns_of_3s[i])))
return output

I am taking advantage of masks and use only bool array as it contains all the information needed for solving this problem.


My approach:



22.3 µs ± 485 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Your approach:



171 µs ± 9.96 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

As the solution certainly isn't perfect, I guess it introduces the way of reducing the problem to easier one, and it can be useful in further optimization.



Nice one! Thanks!


This may not be entirely correct. From the question, the end of any block must always be 2 (or 3, in 1-based indexing). You can then do just output.append((start, (v, 2))) at the last line.


I think @Arcewirz missed the step where the input is converted to rectangular blocks of 3s when columns 1 to n-1 (but not n) contain a 3. But his routine can still be applied to the modified input.


26 4 0
