gpt4 book ai didi

python - 不使用 numpy 数组转置 h5py 数据集

转载 作者:太空宇宙 更新时间:2023-11-04 00:42:44 24 4
gpt4 key购买 nike

我需要转置 h5py 数据集,以访问 3D 数组作为 2D 图像堆栈。

我希望能够在 3 个可能的方向中的任何一个方向上对 3D 体积进行切片,同时将第一个维度保留为图像索引。

我不想将我的数据集转换为 numpy 数组,以避免在只需要显示部分图像时从磁盘读取整个数据集。

最佳答案

这是一个使用代理对象的解决方案,该代理对象在数据集的 __getitem__ 方法之上添加一层以考虑转置。它应该适用于任意数量的维度,但仅在 3D 中进行了广泛测试。

例子:

my_3D_dataset_201_transposition = TransposedDatasetView(
my_3D_dataset,
transposition=(2, 0, 1))
assert my_3D_dataset[i, j, k] == my_3D_dataset_201_transposition[k, i, j]

我的类定义如下:

class TransposedDatasetView(object):
"""
This class provides a way to transpose a dataset without
casting it into a numpy array. This way, the dataset in a file need not
necessarily be integrally read into memory to view it in a different
transposition.

.. note::
The performances depend a lot on the way the dataset was written
to file. Depending on the chunking strategy, reading a complete 2D slice
in an unfavorable direction may still require the entire dataset to
be read from disk.

:param dataset: h5py dataset
:param transposition: List of dimension numbers in the wanted order
"""
def __init__(self, dataset, transposition=None):
"""

"""
super(TransposedDatasetView, self).__init__()
self.dataset = dataset
"""original dataset"""

self.shape = dataset.shape
"""Tuple of array dimensions"""
self.dtype = dataset.dtype
"""Data-type of the array’s element"""
self.ndim = len(dataset.shape)
"""Number of array dimensions"""

size = 0
if self.ndim:
size = 1
for dimsize in self.shape:
size *= dimsize
self.size = size
"""Number of elements in the array."""

self.transposition = list(range(self.ndim))
"""List of dimension indices, in an order depending on the
specified transposition. By default this is simply
[0, ..., self.ndim], but it can be changed by specifying a different
`transposition` parameter at initialization.

Use :meth:`transpose`, to create a new :class:`TransposedDatasetView`
with a different :attr:`transposition`.
"""

if transposition is not None:
assert len(transposition) == self.ndim
assert set(transposition) == set(list(range(self.ndim))), \
"Transposition must be a list containing all dimensions"
self.transposition = transposition
self.__sort_shape()

def __sort_shape(self):
"""Sort shape in the order defined in :attr:`transposition`
"""
new_shape = tuple(self.shape[dim] for dim in self.transposition)
self.shape = new_shape

def __sort_indices(self, indices):
"""Return array indices sorted in the order needed
to access data in the original non-transposed dataset.

:param indices: Tuple of ndim indices, in the order needed
to access the view
:return: Sorted tuple of indices, to access original data
"""
assert len(indices) == self.ndim
sorted_indices = tuple(idx for (_, idx) in
sorted(zip(self.transposition, indices)))
return sorted_indices

def __getitem__(self, item):
"""Handle fancy indexing with regards to the dimension order as
specified in :attr:`transposition`

The supported fancy-indexing syntax is explained at
http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing.

Additional restrictions exist if the data has been transposed:

- numpy boolean array indexing is not supported
- ellipsis objects are not supported

:param item: Index, possibly fancy index (must be supported by h5py)
:return:
"""
# no transposition, let the original dataset handle indexing
if self.transposition == list(range(self.ndim)):
return self.dataset[item]

# 1-D slicing -> n-D slicing (n=1)
if not hasattr(item, "__len__"):
# first dimension index is given
item = [item]
# following dimensions are indexed with : (all elements)
item += [slice(0, sys.maxint, 1) for _i in range(self.ndim - 1)]

# n-dimensional slicing
if len(item) != self.ndim:
raise IndexError(
"N-dim slicing requires a tuple of N indices/slices. " +
"Needed dimensions: %d" % self.ndim)

# get list of indices sorted in the original dataset order
sorted_indices = self.__sort_indices(item)

output_data_not_transposed = self.dataset[sorted_indices]

# now we must transpose the output data
output_dimensions = []
frozen_dimensions = []
for i, idx in enumerate(item):
# slices and sequences
if not isinstance(idx, int):
output_dimensions.append(self.transposition[i])
# regular integer index
else:
# whenever a dimension is fixed (indexed by an integer)
# the number of output dimension is reduced
frozen_dimensions.append(self.transposition[i])

# decrement output dimensions that are above frozen dimensions
for frozen_dim in reversed(sorted(frozen_dimensions)):
for i, out_dim in enumerate(output_dimensions):
if out_dim > frozen_dim:
output_dimensions[i] -= 1

assert (len(output_dimensions) + len(frozen_dimensions)) == self.ndim
assert set(output_dimensions) == set(range(len(output_dimensions)))

return numpy.transpose(output_data_not_transposed,
axes=output_dimensions)

def __array__(self, dtype=None):
"""Cast the dataset into a numpy array, and return it.

If a transposition has been done on this dataset, return
a transposed view of a numpy array."""
return numpy.transpose(numpy.array(self.dataset, dtype=dtype),
self.transposition)

def transpose(self, transposition=None):
"""Return a re-ordered (dimensions permutated)
:class:`TransposedDatasetView`.

The returned object refers to
the same dataset but with a different :attr:`transposition`.

:param list[int] transposition: List of dimension numbers in the wanted order
:return: Transposed TransposedDatasetView
"""
# by default, reverse the dimensions
if transposition is None:
transposition = list(reversed(self.transposition))

return TransposedDatasetView(self.dataset,
transposition)

关于python - 不使用 numpy 数组转置 h5py 数据集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41181114/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com