gpt4 book ai didi

带有字节序转换的 Python 文件 Slurp

转载 作者:太空狗 更新时间:2023-10-30 00:30:45 26 4
gpt4 key购买 nike

最近有人问how to do a file slurp in python ,并且接受的答案建议如下:

with open('x.txt') as x: f = x.read()

我将如何着手执行此操作以读取文件并转换数据的字节序表示形式?

例如,我有一个 1GB 的二进制文件,它只是一堆打包为大端字节序的单精度 float ,我想将其转换为小端字节序并转储到一个 numpy 数组中。下面是我为完成此操作而编写的函数以及调用它的一些实际代码。我使用 struct.unpack 进行字节序转换,并尝试使用 mmap 加快一切。

那么我的问题是,我是否通过 mmapstruct.unpack 正确使用了 slurp?有没有更清洁、更快的方法来做到这一点?现在我的工作很有效,但我真的很想学习如何做得更好。

提前致谢!

#!/usr/bin/python
from struct import unpack
import mmap
import numpy as np

def mmapChannel(arrayName, fileName, channelNo, line_count, sample_count):
"""
We need to read in the asf internal file and convert it into a numpy array.
It is stored as a single row, and is binary. Thenumber of lines (rows), samples (columns),
and channels all come from the .meta text file
Also, internal format files are packed big endian, but most systems use little endian, so we need
to make that conversion as well.
Memory mapping seemed to improve the ingestion speed a bit
"""
# memory-map the file, size 0 means whole file
# length = line_count * sample_count * arrayName.itemsize
print "\tMemory Mapping..."
with open(fileName, "rb") as f:
map = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
map.seek(channelNo*line_count*sample_count*arrayName.itemsize)

for i in xrange(line_count*sample_count):
arrayName[0, i] = unpack('>f', map.read(arrayName.itemsize) )[0]

# Same method as above, just more verbose for the maintenance programmer.
# for i in xrange(line_count*sample_count): #row
# be_float = map.read(arrayName.itemsize) # arrayName.itemsize should be 4 for float32
# le_float = unpack('>f', be_float)[0] # > for big endian, < for little endian
# arrayName[0, i]= le_float

map.close()
return arrayName

print "Initializing the Amp HH HV, and Phase HH HV arrays..."
HHamp = np.ones((1, line_count*sample_count), dtype='float32')
HHphase = np.ones((1, line_count*sample_count), dtype='float32')
HVamp = np.ones((1, line_count*sample_count), dtype='float32')
HVphase = np.ones((1, line_count*sample_count), dtype='float32')



print "Ingesting HH_Amp..."
HHamp = mmapChannel(HHamp, 'ALPSRP042301700-P1.1__A.img', 0, line_count, sample_count)
print "Ingesting HH_phase..."
HHphase = mmapChannel(HHphase, 'ALPSRP042301700-P1.1__A.img', 1, line_count, sample_count)
print "Ingesting HV_AMP..."
HVamp = mmapChannel(HVamp, 'ALPSRP042301700-P1.1__A.img', 2, line_count, sample_count)
print "Ingesting HV_phase..."
HVphase = mmapChannel(HVphase, 'ALPSRP042301700-P1.1__A.img', 3, line_count, sample_count)

print "Reshaping...."
HHamp_orig = HHamp.reshape(line_count, -1)
HHphase_orig = HHphase.reshape(line_count, -1)
HVamp_orig = HVamp.reshape(line_count, -1)
HVphase_orig = HVphase.reshape(line_count, -1)

最佳答案

稍作修改@Alex Martelli's answer :

arr = numpy.fromfile(filename, numpy.dtype('>f4'))
# no byteswap is needed regardless of endianess of the machine

关于带有字节序转换的 Python 文件 Slurp,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1632673/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com