gpt4 book ai didi

python - 从python中的数据集中解析特定列

转载 作者:太空宇宙 更新时间:2023-11-04 03:52:09 24 4
gpt4 key购买 nike

我有一个包含多列的数据集,我只对分析其中六列的数据感兴趣。它在一个 txt 文件中,我想加载该文件并使用标题(时间、模式、事件、xcoord、ycoord、phi)拉出以下列(0、1、2、4、6、7)。总共有十列,这是数据的示例:

1385940076332   3   M   subject_avatar  -30.000000  1.000000    -59.028107  180.000000  0.000000    0.000000
1385940076336 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076339 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076342 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076346 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076350 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076353 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076356 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000

当我使用以下代码将数据解析为列时,它似乎只对数据进行计数 - 但我希望能够列出数据以供进一步分析。这是我使用的来自@alko 的代码:

import pandas as pd
df = pd.read_csv('filtered.txt', header=None, false_values=None, sep='\s+')[[0, 1, 2, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df

这是该代码返回的内容:

class 'pandas.core.frame.DataFrame'
Int64Index: 115534 entries, 0 to 115533
Data columns (total 6 columns):
time 115534 non-null values
mode 115534 non-null values
event 115534 non-null values
xcoord 115534 non-null values
ycoord 115534 non-null values
phi 115534 non-null values
dtypes: float64(3), int64(2), object(1)

所以目标是从原来的10列中拉出这6列,标注出来,列出来。

最佳答案

您可以使用 pandas ' read_csv解析器:

import pandas as pd
from StringIO import StringIO
s = """1385940076332 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076336 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076339 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076342 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076346 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076350 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076353 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076356 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.# 000000 0.000000"""

df = pd.read_csv(StringIO(s),header=None, sep='\s+')[[0, 2, 3, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df
# time mode event xcoord ycoord phi
# 0 1385940076332 M subject_avatar -30 -59.028107 180
# 1 1385940076336 M subject_avatar -30 -59.028107 180
# 2 1385940076339 M subject_avatar -30 -59.028107 180
# 3 1385940076342 M subject_avatar -30 -59.028107 180
# 4 1385940076346 M subject_avatar -30 -59.028107 180
# 5 1385940076350 M subject_avatar -30 -59.028107 180
# 6 1385940076353 M subject_avatar -30 -59.028107 180
# 7 1385940076356 M subject_avatar -30 -59.028107 180

请注意,我更正了列索引,因为您在问题中提供的索引似乎不正确。

关于python - 从python中的数据集中解析特定列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20802891/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com