gpt4 book ai didi

Python( Pandas ): How to read data by chunks in a long file where chunks are separated by a header and are not equal in length?

转载 作者:行者123 更新时间:2023-12-01 04:15:20 27 4
gpt4 key购买 nike

我在一个 txt 文件中有以下数据:

00001+++00001   000031         12.8600    -1    7     BEAR

1990052418 276.00 18.80 0.00 12.86

1990052500 276.00 19.70 0.00 12.86

00002+++00002 000047 30.8700 995 22 LION

1990072206 318.10 8.80 1010.00 12.86

1990072212 316.80 8.90 1010.00 12.86

1990072218 315.40 9.30 1010.00 12.86

1990072300 313.60 9.30 1010.00 12.86

00003+++00003 000050 36.0100 973 37 BIRD

1990072412 285.00 34.00 1012.00 10.29

1990072418 286.20 33.60 1013.00 10.29

1990072500 287.30 33.00 1013.00 10.29

如何在Python(Pandas)中分别读取每个 header 下的每个(不相等) block ?

最佳答案

这是一个解析每个 block 的解决方案(不读取整个文件):

def read_fwfs(text_file):
res = {}
with open(text_file) as f:
header = f.readline()
block = []
for line in f:
if "+++" in line:
# End of block, so save it to dict as DataFrame.
# Note: this next line could be pulled out as a function.
res[header.split("+++")[0]] = pd.read_fwf(StringIO("".join(block)), header=None)
# Reset variables.
header = line
block = []
# Ignore blank lines.
elif line != "\n":
block.append(line)
# Save the last block.
# Note: See what I mean about being a function? Here it is again:
res[header.split("+++")[0]] = pd.read_fwf(StringIO("".join(block)), header=None)
return res

In [11]: d = read_fwfs("my_text_file.txt")
d
Out[11]:
{'00001': 0 1 2 3 4
0 1990052418 276 18.8 0 12.86
1 1990052500 276 19.7 0 12.86,
'00002': 0 1 2 3 4
0 1990072206 318.1 8.8 1010 12.86
1 1990072212 316.8 8.9 1010 12.86
2 1990072218 315.4 9.3 1010 12.86
3 1990072300 313.6 9.3 1010 12.86,
'00003': 0 1 2 3 4
0 1990072412 285.0 34.0 1012 10.29
1 1990072418 286.2 33.6 1013 10.29
2 1990072500 287.3 33.0 1013 10.29}

In [12]: d["00003"]
Out[12]:
0 1 2 3 4
0 1990072412 285.0 34.0 1012 10.29
1 1990072418 286.2 33.6 1013 10.29
2 1990072500 287.3 33.0 1013 10.29

如果你想用分割线做更多的事情,例如使用它作为标题,您可以将其添加到 read_fwf 部分,例如通过将分割 header 的后半部分添加到连接 block 中。正如我提到的内联,无论如何将该部分作为函数取出可能是个好主意。

关于Python( Pandas ): How to read data by chunks in a long file where chunks are separated by a header and are not equal in length?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34378053/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com