gpt4 book ai didi

python从文件中提取数据到数据帧

转载 作者:行者123 更新时间:2023-12-01 00:33:06 24 4
gpt4 key购买 nike

我导入了某种通用索引

f = open(indexfile, "r")

生成的对象是一个 _io.TextIOWrapper,如下所示:

GROUP_FIELD_NAME:ID
GROUP_FIELD_VALUE:1
GROUP_FIELD_NAME:NAME
GROUP_FIELD_VALUE:Joe
GROUP_OFFSET:0
GROUP_LENGTH:1234
GROUP_FILENAME:/tmp/something1
GROUP_FIELD_NAME:ID
GROUP_FIELD_VALUE:2
GROUP_FIELD_NAME:NAME
GROUP_FIELD_VALUE:Jenny
GROUP_OFFSET:1235
GROUP_LENGTH:12
GROUP_FILENAME:/tmp/something2

某些数据字段可以通过组合相应的 _NAME 和 _VALUE 来提取,而某些字段只需要查看名称(_OFFSET、_LENGTH、_FILENAME)。例如,通过循环遍历每一行并填充列表,如下所示:

Import pandas as pd

ID = []
NAME = []
GROUP_LENGTH = []
GROUP_OFFSET = []
GROUP_FILENAME = []

for line in file:
if GROUP_OFFSET then add to list
if GROUP_FIELD_NAME:ID then add GROUP_FIELD_VALUE from next line


a = {'ID': ID,
'NAME': NAME,
'GROUP_LENGTH': GROUP_LENGTH,
'GROUP_OFFSET': GROUP_OFFSET,
'GROUP_FILENAME': GROUP_FILENAME
}

df = pd.DataFrame.from_dict(a, orient='index')

df = df.transpose()

我怎样才能得到这样的东西:

ID     NAME    GROUP_LENGTH    GROUP_OFFSET    GROUP_FILENAME
1 Joe 1234 0 /tmp/something1
2 Jenny 12 1235 /tmp/something2

最佳答案

使用collections.OrderedDict对象累积记录:

import pandas as pd
from collections import OrderedDict

with open('input.ind') as f:
records = []
for line in f:
name, val = line.strip().split(':')
if name == 'GROUP_FIELD_NAME':
if val == 'ID':
records.append(OrderedDict())
records[-1][val] = next(f).strip().split(':')[1]
else:
records[-1][name] = val

df = pd.DataFrame(records)
print(df)

预期输出:

  ID   NAME GROUP_OFFSET GROUP_LENGTH   GROUP_FILENAME
0 1 Joe 0 1234 /tmp/something1
1 2 Jenny 1235 12 /tmp/something2

关于python从文件中提取数据到数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58027078/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com