gpt4 book ai didi

python - 在Python中解析具有非均匀行的数据

转载 作者:太空宇宙 更新时间:2023-11-03 18:39:04 25 4
gpt4 key购买 nike

我有一个数据集,我想解析它以进行分析。我想提取特定的列,然后将它们在不一致的行之前和之后分开。以下是我的数据的示例:请注意中间的三行与其他行的格式不匹配:

1386865618963   1   M   subject_avatar  3.636229    1.000000    5.422941    30.200327   0.000000    0.000000
1386865618965 1 M subject_avatar 3.631835 1.000000 5.415390 30.200327 0.000000 0.000000
1386865618966 2 M subject_avatar 3.627432 1.000000 5.407826 30.200327 0.000000 0.000000
1386865618968 1 M subject_avatar 3.625223 1.000000 5.404030 30.200327 0.000000 0.000000
1386865618970 1 M subject_avatar 3.620788 1.000000 5.396411 30.200327 0.000000 0.000000
1386865618970 0 D 4345048336
1386865618970 0 D 4345763672
1386865618971 0 I BOXGEOM (45.0, 0.0, -45.0, 19.0, 3.5, 19.0) {'callback': <bound method YCEnvironment.dropoff of <navigate.YCEnvironment instance at 0x103065440>>, 'cbargs': (0, {'width': 1.75, 'image': <pyepl.display.Image object at 0x102f9da90>, 'height': 4.75, 'volbitSize': (0.5, 0.71999999999999997), 'name': 'Julia'}, {'width': 0.69999999999999996, 'name': 'Flower Patch', 'realpos': (45.0, 0.0, -45.0), 'image': <pyepl.display.Image object at 0x102fc3f50>, 'realsize': (7.0, 3.5, 7.0), 'type': 'store', 'volbitSize': (0.5, 0.5), 'height': 0.34999999999999998}), 'permiable': True} 4926595152
1386865618972 1 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865618992 2 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865618996 1 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865618998 2 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865619002 1 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865619005 1 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000
1386865619008 1 M subject_avatar 3.621182 1.000000 5.396492 30.200327 0.000000 0.000000

我之前问过一个问题( Parsing specific columns from a dataset in python )将此数据解析为列,但是,列仅显示列中的项目数,而不显示项目本身。

我意识到这是两个不同的问题(分成列,在不一致的行之前和之后分开),但是任何有关解析的帮助将不胜感激!

最佳答案

一个简单的想法:

您可以预处理原始文件以跳过所有不相关的行,也许:

with open('raw.txt', 'r') as infile:
f = infile.readlines()
with open('filtered.txt', 'w') as outfile:
for line in f:
if 'subject_avatar' in line: # or other better rules
outfile.write(line)

然后,您可以使用 pandas 或其他方式处理 filtered.txt 干净数据。

<小时/>
with open('d.txt', 'r') as infile:
f = infile.readlines()
with open('filtered_part1.txt', 'w') as outfile:
for i in range(len(f)):
line = f[i]
if line[16] == '0':
i += 1
break
outfile.write(line)
while f[i][16] == '0': # skip a few lines
i += 1
with open('filtered_part2.txt', 'w') as outfile:
while i < len(f):
outfile.write(f[i])
i += 1

这里提供了丑陋但可行的分离。基本上是找到 0 并跳过这些行。

关于python - 在Python中解析具有非均匀行的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20954281/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com