gpt4 book ai didi

python - 如何读取无序的文本数据?

转载 作者:行者123 更新时间:2023-12-04 14:51:07 26 4
gpt4 key购买 nike

我想将文本数据逐行读入内存并将其写入不同格式的文件中。数据看起来像这样:

   15LI     aLI   15   9.34   5.31   5.53
15LI aLI 15 9.51 4.55 5.54
15LI aLI 15 9.45 4.30 5.47
15LI aLI 15 10.29 3.77 5.91
15LI aLI 15 -97.89 -21.55 5.47
15LI aLI 15 -97.85 -21.69 5.88
15LI aLI 15 -96.61 -21.03 5.24
15LI aLI 15-103.25 -9.02 5.24
15LI aLI 15-102.55 -9.73 5.07
15LI aLI 15-102.54 -9.70 5.64
15LI aLI 15-102.40 -9.68 5.54
正如您所看到的,第 3 列和第 4 列之间的空间随着数字的增加而消失。我正在使用 numpy.genfromtxt 读取数据,但无法读取第六行后的数据并抛出以下错误:
 ValueError: Some errors were detected !
Line #7 (got 5 columns instead of 6)
Line #8 (got 5 columns instead of 6)
Line #9 (got 5 columns instead of 6)
Line #10 (got 5 columns instead of 6)
python中有什么方法可以在列之间创建空间或在没有空间的情况下读取它们?这是我的小代码:
import h5py
import numpy as np

#define a np.dtype for gro array/dataset (hard-coded for now)
gro_dt = np.dtype([('col1', 'S4'), ('col2', 'S4'), ('col3', int),
('col4', float), ('col5', float), ('col6', float)])

# Next, create an empty .h5 file with the dtype
with h5py.File('pep.h5', 'w') as hdf:
ds= hdf.create_dataset('dataset1', dtype=gro_dt, shape=(20,), maxshape=(None,))

# Next read line 1 of .gro file
f = open('testing.dat', 'r')
data = f.readlines()
ds.attrs["Source"]=data[0]
f.close()

# loop to read rows from 2 until end
skip, incr, row0 = 0, 20, 0
read_gro = True
while read_gro:
arr = np.genfromtxt('testing.dat', skip_header=skip, max_rows=incr, dtype=gro_dt)
rows = arr.shape[0]
if rows == 0:
read_gro = False
else:
if row0+rows > ds.shape[0] :
ds.resize((row0+rows,))
ds[row0:row0+rows] = arr
skip += rows
row0 += rows

最佳答案

您可以手动解析数据,然后将其转储为您需要的任何格式。此答案基于您声明“前 3 列保持不变”的假设。

import pandas as pd

def gen_lines(filename):
wanted = 6
with open(filename, "r") as fin:
for line in fin:
parts = line.split()
if len(parts) < wanted:
first = parts[:2]
final = parts[3:]
middle = parts[2].split("-")
middle[-1] = "-" + middle[-1]
yield first + middle + final
else:
yield parts

lines = gen_lines("foo.txt")
df = pd.DataFrame(lines)

print(df)
0 1 2 3 4 5
0 15LI aLI 15 9.34 5.31 5.53
1 15LI aLI 15 9.51 4.55 5.54
2 15LI aLI 15 9.45 4.30 5.47
3 15LI aLI 15 10.29 3.77 5.91
4 15LI aLI 15 -97.89 -21.55 5.47
5 15LI aLI 15 -97.85 -21.69 5.88
6 15LI aLI 15 -96.61 -21.03 5.24
7 15LI aLI 15 -103.25 -9.02 5.24
8 15LI aLI 15 -102.55 -9.73 5.07
9 15LI aLI 15 -102.54 -9.70 5.64
10 15LI aLI 15 -102.40 -9.68 5.54

关于python - 如何读取无序的文本数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69090126/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com