gpt4 book ai didi

python - astropy.io.fits 从具有多个 HDU 的大型拟合文件中读取行

转载 作者:太空宇宙 更新时间:2023-11-03 15:03:04 25 4
gpt4 key购买 nike

我有一个约 50GB 的适合文件,包含多个 HDU,它们都具有相同的格式:一个 (1E5 x 1E6) 数组,包含 1E5 个对象和 1E6 个时间戳。 HDU 描述了不同的物理属性,例如 Flux、RA、DEC 等。我只想从每个 HDU 中读取 5 个对象(即 (5 x 1E6) 数组)。

python 2.7,天体1.0.3,Linux x86_64

到目前为止,我尝试了很多我发现的建议,但没有任何效果。我最好的方法仍然是:

#the five objects I want to read out
obj_list = ['Star1','Star15','Star700','Star2000','Star5000']
dic = {}

with fits.open(fname, memmap=True, do_not_scale_image_data=True) as hdulist:

# There is a special HDU 'OBJECTS' which is an (1E5 x 1) array and contains the info which index in the fits file corresponds to which object.

# First, get the indices of the rows that describe the objects in the fits file (not necessarily in order!)
ind_objs = np.in1d(hdulist['OBJECTS'].data, obj_list, assume_unique=True).nonzero()[0] #indices of the candidates

# Second, read out the 5 object's time series
dic['FLUX'] = hdulist['FLUX'].data[ind_objs] # (5 x 1E6) array
dic['RA'] = hdulist['RA'].data[ind_objs] # (5 x 1E6) array
dic['DEC'] = hdulist['DEC'].data[ind_objs] # (5 x 1E6) array

此代码适用于最大约 20 GB 的文件,运行良好且速度很快,但对于更大的文件会耗尽内存(更大的文件只包含更多的对象,而不是更多的时间戳)。我不明白为什么 - 据我所知,astropy.io.fits 本质上使用 mmap 并且应该只将 (5x1E6) 数组加载到内存中?与文件大小无关,我想要读出的内容始终具有相同的大小。

编辑 - 这是错误信息:

  dic['RA'] = hdulist['RA'].data[ind_objs] # (5 x 1E6) array
File "/usr/local/python/lib/python2.7/site-packages/astropy-1.0.3-py2.7-linux-x86_64.egg/astropy/utils/decorators.py", line 341, in __get__
val = self._fget(obj)
File "/usr/local/python/lib/python2.7/site-packages/astropy-1.0.3-py2.7-linux-x86_64.egg/astropy/io/fits/hdu/image.py", line 239, in data
data = self._get_scaled_image_data(self._data_offset, self.shape)
File "/usr/local/python/lib/python2.7/site-packages/astropy-1.0.3-py2.7-linux-x86_64.egg/astropy/io/fits/hdu/image.py", line 585, in _get_scaled_image_data
raw_data = self._get_raw_data(shape, code, offset)
File "/usr/local/python/lib/python2.7/site-packages/astropy-1.0.3-py2.7-linux-x86_64.egg/astropy/io/fits/hdu/base.py", line 523, in _get_raw_data
return self._file.readarray(offset=offset, dtype=code, shape=shape)
File "/usr/local/python/lib/python2.7/site-packages/astropy-1.0.3-py2.7-linux-x86_64.egg/astropy/io/fits/file.py", line 248, in readarray
shape=shape).view(np.ndarray)
File "/usr/local/python/lib/python2.7/site-packages/numpy/core/memmap.py", line 254, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap.error: [Errno 12] Cannot allocate memory

编辑 2:谢谢,我现在包含了建议,它使我能够处理最大 50GB 的适合文件。新代码:

#the five objects I want to read out
obj_list = ['Star1','Star15','Star700','Star2000','Star5000']
dic = {}

with fits.open(fname, mode='denywrite', memmap=True, do_not_scale_image_data=True) as hdulist:

# There is a special HDU 'OBJECTS' which is an (1E5 x 1) array and contains the info which index in the fits file corresponds to which object.

# First, get the indices of the rows that describe the objects in the fits file (not necessarily in order!)
ind_objs = np.in1d(hdulist['OBJECTS'].data, obj_list, assume_unique=True).nonzero()[0] #indices of the candidates

# Second, read out the 5 object's time series
dic['FLUX'] = hdulist['FLUX'].data[ind_objs] # (5 x 1E6) array
del hdulist['FLUX'].data
dic['RA'] = hdulist['RA'].data[ind_objs] # (5 x 1E6) array
del hdulist['RA'].data
dic['DEC'] = hdulist['DEC'].data[ind_objs] # (5 x 1E6) array
del hdulist['DEC'].data

mode='denywrite'

没有引起任何变化。

memmap=True 

确实不是默认的,需要手动设置。

del hdulist['FLUX'].data 

etc 现在允许我读取 50GB 而不是 20GB 的文件

新问题:任何大于 50GB 的内容仍然会导致相同的内存错误 - 但是,现在直接在第一行。

dic['FLUX'] = hdulist['FLUX'].data[ind_objs] # (5 x 1E6) array

最佳答案

您似乎遇到过这个问题:https://github.com/astropy/astropy/issues/1380

这里的问题是,即使它使用的是 mmap,它也是在写时复制模式下使用 mmap,这意味着您的系统需要能够分配足够大的虚拟内存区域,原则上可以容纳与 mmap 大小一样多的数据,以防您将数据写回 mmap。

如果您将 mode='denywrite' 传递给 fits.open() 它应该可以工作。任何修改数组的尝试都会导致错误,但如果您只想读取数据,那也没关系。

如果你仍然无法让它工作,你也可以试试 fitsio更好地支持以较小的 block 读取文件的模块。

关于python - astropy.io.fits 从具有多个 HDU 的大型拟合文件中读取行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35759713/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com