gpt4 book ai didi

python - numpy:gzipped 文件的 fromfile

转载 作者:太空宇宙 更新时间:2023-11-04 03:10:55 24 4
gpt4 key购买 nike

我正在使用 numpy.fromfile 构建一个数组,我可以将其传递给 pandas.DataFrame 构造函数

import numpy as np
import pandas as pd

def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]

dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})
return pd.DataFrame(np.fromfile(file, dt))

我想扩展此方法以处理压缩文件。

根据numpy.fromfile文档,第一个参数是file:

file : file or str
Open file object or filename

因此,我添加了以下内容来检查 gzip 文件路径:

if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")

但是,当我尝试通过 fromfile 构造函数传递它时,我得到一个 IOError:

IOError: first argument must be an open file

问题:

如何使用 gzip 文件调用 numpy.fromfile

编辑:

根据评论中的要求,显示检查 gzip 文件的实现:

def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]

dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})

if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")

return pd.DataFrame(np.fromfile(file, dt))

最佳答案

通过将 read() 结果输入 numpy.frombuffer(),我已经成功地从 gzip 文件中读取原始二进制数据数组。此代码适用于 Python 3.7.3,或许也适用于更早的版本。

# Example: read short integers (signed) from gzipped raw binary file

import gzip
import numpy as np

fname_gzipped = 'my_binary_data.dat.gz'
raw_dtype = np.int16
with gzip.open(fname_gzipped, 'rb') as f:
from_gzipped = np.frombuffer(f.read(), dtype=raw_dtype)

# Demonstrate equivalence with direct np.fromfile()
fname_raw = 'my_binary_data.dat'
from_raw = np.fromfile(fname_raw, dtype=raw_dtype)

# True
print('raw binary and gunzipped are the same: {}'.format(
np.array_equiv(from_gzipped, from_raw)))

# False
wrong_dtype = np.uint8
binary_as_wrong_dtype = np.fromfile(fname_raw, dtype=wrong_dtype)
print('wrong dtype and gunzipped are the same: {}'.format(
np.array_equiv(from_gzipped, binary_as_wrong_dtype)))

关于python - numpy:gzipped 文件的 fromfile,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38060450/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com