python - 如何在python中读取复杂的数据？-6ren

python - 如何在python中读取复杂的数据？

转载作者：行者123 更新时间：2023-12-04 11:48:10

我正在尝试读取结构不佳的数据。它看起来像这样

Generated by trjconv : P/L=1/400 t=   0.00000
11214
    1P1     aP1    1  80.48  35.36   4.25
    2P1     aP1    2  37.45   3.92   3.96
    3P2     aP2    3  18.53  -9.69   4.68
    4P2     aP2    4  55.39  74.34   4.60
    5P3     aP3    5  22.11  68.71   3.85
    6P3     aP3    6  -4.13  24.04   3.73
    7P4     aP4    7  40.16   6.39   4.73
    8P4     aP4    8  -5.40  35.73   4.85
    9P5     aP5    9  36.67  22.45   4.08
   10P5     aP5   10  -3.68 -10.66   4.18
Generated by trjconv : P/L=1/400 t=   1000.000
11214
    1P1     aP1    1  80.48  35.36   4.25
    2P1     aP1    2  37.45   3.92   3.96
    3P2     aP2    3  18.53  -9.69   4.68
    4P2     aP2    4  55.39  74.34   4.60
    5P3     aP3    5  22.11  68.71   3.85
    6P3     aP3    6  -4.13  24.04   3.73
    7P4     aP4    7  40.16   6.39   4.73
    8P4     aP4    8  -5.40  35.73   4.85
    9P5     aP5    9  36.67  22.45   4.08
   10P5     aP5   10  -3.68 -10.66   4.18
Generated by trjconv : P/L=1/400 t=   2000.000
11214
    1P1     aP1    1  80.48  35.36   4.25
    2P1     aP1    2  37.45   3.92   3.96
    3P2     aP2    3  18.53  -9.69   4.68
    4P2     aP2    4  55.39  74.34   4.60
    5P3     aP3    5  22.11  68.71   3.85
    6P3     aP3    6  -4.13  24.04   3.73
    7P4     aP4    7  40.16   6.39   4.73
    8P4     aP4    8  -5.40  35.73   4.85
    9P5     aP5    9  36.67  22.45   4.08
   10P5     aP5   10  -3.68 -10.66   4.18
Generated by trjconv : P/L=1/400 t=   3000.000
11214
    1P1     aP1    1  80.48  35.36   4.25
    2P1     aP1    2  37.45   3.92   3.96
    3P2     aP2    3  18.53  -9.69   4.68
    4P2     aP2    4  55.39  74.34   4.60
    5P3     aP3    5  22.11  68.71   3.85
    6P3     aP3    6  -4.13  24.04   3.73
    7P4     aP4    7  40.16   6.39   4.73
    8P4     aP4    8  -5.40  35.73   4.85
    9P5     aP5    9  36.67  22.45   4.08
   10P5     aP5   10  -3.68 -10.66   4.18

它由具有更新时间的不同帧组成。我在这里展示的只是一个样本。整个文件大约 50GB。因此，最好逐行或分块阅读。但我不知道如何处理每一帧的标题。有没有办法摆脱这些标题？现在我使用了以下方法:

import numpy as np

#define a np.dtype for gro array/dataset (hard-coded for now)
gro_dt = np.dtype([('col1', 'S4'), ('col2', 'S4'), ('col3', int), 
                   ('col4', float), ('col5', float), ('col6', float)])

file = np.genfromtxt('sample.gro', skip_header = 2, dtype=gro_dt)

但是当涉及到下一个标题时，它会引发以下错误。

ValueError: Some errors were detected !
    Line #13 (got 7 columns instead of 6)
    Line #14 (got 1 columns instead of 6)
    Line #25 (got 7 columns instead of 6)
    Line #26 (got 1 columns instead of 6)
    Line #37 (got 7 columns instead of 6)
    Line #38 (got 1 columns instead of 6)

最佳答案

编写一个去除周期性 header 的适配器。

def adapt(f):
    for line in f:
        if line.startswith("Generated"):
            print(line, end='')
            # Consume the following line as well.
            # If your data is well behaved, you can 
            # assume the following line exists and should be
            # skipped, instead of using the try statement.
            try:
                print(next(f), end='')
            except StopIteration:
                pass
            continue
        yield line

with open('sample.gro') as f:
    file = np.genfromtxt(adapt(f), dtype=gro_dt)

关于python - 如何在python中读取复杂的数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69064631/

文章推荐： c++ - 有边界的strchr吗？

文章推荐： c - 均匀分布 : Bug or Paradox

文章推荐： rust - 为什么打印时会出现 "expected reference ` &usize`"？

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何在python中读取复杂的数据？