gpt4 book ai didi

python - Dask 数组到 zarr 形状未知

转载 作者:太空宇宙 更新时间:2023-11-03 20:36:06 24 4
gpt4 key购买 nike

我正在尝试将 dask 数组存储在 zarr 文件中。

当 dask 数组具有定义的形状时,我已经成功地做到了这一点。


import dask
import dask.array as da
import numpy as np
from tempfile import TemporaryDirectory
import zarr


np_array = np.random.randint(1, 10, size=1000)
array = da.from_array(np_array)

with TemporaryDirectory() as tmpdir:
delayed = da.to_zarr(array, url=tmpdir,
compute=False, component='/data')
dask.compute(delayed)

z_object = zarr.open_group(tmpdir, mode='r')

assert np.all(np_array == z_object.data[:])

但是,如果我对 dask 数组执行了任何操作,形状就会丢失,并且 zarr 会提示形状中的 Nans。

# this will fail

np_array = np.random.randint(1, 10, size=1000)
array = da.from_array(np_array)

array = array[array > 5]

with TemporaryDirectory() as tmpdir:
delayed = da.to_zarr(array, url=tmpdir,
compute=False, component='/data')
dask.compute(delayed)

z_object = zarr.open_group(tmpdir, mode='r')

assert np.all(np_array[np_array > 5] == z_object.data[:])

这是引发的错误:

Traceback (most recent call last):
File "/home/peio/devel/variation/variation6/variation6/tests/test_zarr.py", line 38, in <module>
without_shape()
File "/home/peio/devel/variation/variation6/variation6/tests/test_zarr.py", line 29, in without_shape
compute=False, component='/data')
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/dask/array/core.py", line 2808, in to_zarr
**kwargs
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/creation.py", line 120, in create
chunk_store=chunk_store, filters=filters, object_codec=object_codec)
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/storage.py", line 323, in init_array
object_codec=object_codec)
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/storage.py", line 343, in _init_array_metadata
shape = normalize_shape(shape) + dtype.shape
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/util.py", line 58, in normalize_shape
shape = tuple(int(s) for s in shape)
File "/home/peio/devel/variation/pyenv3/lib/python3.7/site-packages/zarr/util.py", line 58, in <genexpr>
shape = tuple(int(s) for s in shape)
ValueError: cannot convert float NaN to integer

有没有办法将形状未知的 dask 数组存储到 zarr 文件中?

提前致谢!

最佳答案

Zarr 期望 block 的形状是统一的并且是事先已知的。 Dask 目前通过将数组重新分块以使其统一来促进这一点。然而 array[array > 5] 会创建一个具有未知 block 形状的 Dask 数组。因此,由于不存在所需的信息,因此无法预先将其重新分块以使其统一。也就是说,我们可以 explain this better .

可以通过使用返回已知 block 形状的 Dask 操作来解决此问题(正如 David 建议的那样)。或者,可以在存储之前确定 block 形状( at the cost of computing )。我们还可以讨论extending Zarr to handle this case ,但这是一个长期的解决方案。

关于python - Dask 数组到 zarr 形状未知,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57162752/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com