gpt4 book ai didi

python - Numpy 和 diff()

转载 作者:太空宇宙 更新时间:2023-11-03 15:00:48 25 4
gpt4 key购买 nike

我正在尝试创建已排序的 numpy 数组的差异,这样如果我记录第一行的值和差异,我可以重新创建原始表但存储较少的数据。

下面是表格的示例:

my_array = numpy.array([(0,  0,   0,  0,  0,   0,  0,  0,   0,  0,  0,   0,  0,  0), 
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)
],'uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8')

在运行 numpy.diff(my_array) 之后,我期待这样的事情:

[(0,  0,   0,  0,  0,   0,  0,  0,   0,  0,  0,   0,  0,  1), 
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 32),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
]

Note: The data above comes from the first & last three rows of the 'real' data, which is much much larger. With the full dataset, most of the rows after a diff would be 0,0,0,0,0,0,0,0,0,0,0,0,1 -- which can a) be stored in a much smaller struct, and b) will compress fantastically well on disk since most rows contain very similar data.

I should probably point out that the reason I have a whole bunch of uint8's in the first place, is because I needed to store an array of extremely large numbers, in the smallest amount of memory possible. The largest number was 185439173519100986733232011757860, which is too big for uint64. In fact, the smallest number of bits to store it would be 108 bits, or 14 bytes (to the nearest byte). So to fit these large numbers into numpy, i use the following two functions:

def large_number_to_numpy(number,columns):
return tuple((number >> (8*x)) & 255 for x in range(columns-1,-1,-1))

def numpy_to_large_number(numbers):
return sum([y << (8*x) for x,y in enumerate(numbers[::-1])])

Which is used like this:

>>> large_number_to_numpy(185439173519100986733232011757860L,14)
(9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L)

numpy_to_large_number((9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L))
185439173519100986733232011757860L

With the array created like this:

my_array = numpy.zeros(TOTAL_ROWS,','.join(14*['uint8']))

And then populated with:

my_array[x] = large_number_to_numpy(large_number,14)

但我得到的是:

>>> my_array
array([(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)],
dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])
>>> numpy.diff(my_array)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1567, in diff
return a[slice1]-a[slice2]
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])

最佳答案

问题是您有一个结构化数组而不是常规二维数组,因此 numpy 不知道如何从一个元组中减去另一个元组。

将结构化数组转换为常规数组 ( from this SO question ):

my_array = my_array.view(numpy.uint8).reshape((my_array.shape[0], -1))

然后执行 numpy.diff(my_array, axis=0)

或者,如果可以的话,通过将 my_array 定义为

来避免创建结构化数组
numpy.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
[9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34],
[9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35],
[9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36]],
dtype=numpy.uint8)

关于python - Numpy 和 diff(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38102566/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com