gpt4 book ai didi

python - 解析文本文件python并转换为pandas dataframe

转载 作者:太空宇宙 更新时间:2023-11-03 13:32:41 25 4
gpt4 key购买 nike

我正在尝试解析文本文件,将其转换为 Pandas 数据框。文件(包括空白行):

HEADING1
value 1

HEADING2
value 2

HEADING1,
value 11

HEADING2
value 12

应该被转换成一个dataframe:

HEADING1, HEADING2
value 1, value 2
value 11, value 12

我试过下面的代码。但是,我不确定使用 converters 是否可行?

df = pd.read_table(textfile, header=None, skip_blank_lines=True, delimiter='\n',
# converters= 'what should I use?',
names= 'HEADING1, HEADING2'.split() )

最佳答案

您自己解析文本并在 '\n\n' 上拆分

# split file by `'\n\n'` to get rows
# split again by `'\n'` to get columns
# `zip` to get convenient lists of headers and values
cols, vals = zip(
*[line.split('\n') for line in open(textfile).read().split('\n\n')]
)

# construct a `pd.Series`
# note: your index contained in the `cols` list will not be unique
s = pd.Series(vals, cols)

# we'll need to enumerate the duplicated index values so that we can unstack
# we do this by creating a `pd.MultiIndex` with `cumcount` then the header values
s.index = [s.groupby(level=0).cumcount(), s.index]

# finally, `unstack`
s.unstack()

HEADING1 HEADING2
0 value 1 value 2
1 value 11 value 12

分割

list理解

[line.split('\n') for line in StringIO(txt).read().split('\n\n')]

[['HEADING1', 'value 1'],
['HEADING2', 'value 2'],
['HEADING1', 'value 11'],
['HEADING2', 'value 12']]

使用zip

list(zip(*[line.split('\n') for line in StringIO(txt).read().split('\n\n')]))

[('HEADING1', 'HEADING2', 'HEADING1', 'HEADING2'),
('value 1', 'value 2', 'value 11', 'value 12')]

设置colsvals

cols, vals = zip(*[line.split('\n') for line in StringIO(txt).read().split('\n\n')])

print(cols)
print()
print(vals)

('HEADING1', 'HEADING2', 'HEADING1', 'HEADING2')

('value 1', 'value 2', 'value 11', 'value 12')

制作系列

s = pd.Series(vals, cols)
s

HEADING1 value 1
HEADING2 value 2
HEADING1 value 11
HEADING2 value 12
dtype: object

枚举索引值

s.index = [s.groupby(level=0).cumcount(), s.index]
s

0 HEADING1 value 1
HEADING2 value 2
1 HEADING1 value 11
HEADING2 value 12
dtype: object

展开

s.unstack()

HEADING1 HEADING2
0 value 1 value 2
1 value 11 value 12

完整演示

import pandas as pd
from io import StringIO

txt = """HEADING1
value 1

HEADING2
value 2

HEADING1
value 11

HEADING2
value 12"""

cols, vals = zip(*[line.split('\n') for line in StringIO(txt).read().split('\n\n')])

s = pd.Series(vals, cols)
s.index = [s.groupby(level=0).cumcount(), s.index]

s.unstack()

HEADING1 HEADING2
0 value 1 value 2
1 value 11 value 12

关于python - 解析文本文件python并转换为pandas dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44288169/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com