gpt4 book ai didi

python - 将 csv 读入 pandas df,由此它的行可能被分成多行

转载 作者:太空宇宙 更新时间:2023-11-04 05:05:51 26 4
gpt4 key购买 nike

我想将此 csv 文件读入 pandas.DataFrame

Id,Name,Shape Library,Page Name,Line Connection Start,Line Connection End,Text Area 1,Text Area 2,Text Area 3,Text Area 4
1,Page,,0:Page 1,,,,,,
2,Table,Tables,0:Page 1,,,Openingsuren gemeentehuis,Action,"Is het gemeentehuis open?
Wat zijn de openingsuren van het gemeentehuis
Wanneer is het gemeentehuis open","webhook
De webserver staat niet op denk ik, gelieve ... te contacteren"
3,easy,Tables,0:Page 1,,,Openignsuren andere dag,Action,"En morgen?",
4,easy,Tables,0:Page 1,,,Openingsuren,,,

但有些行可以显示在多行中(见 Id 2)

有没有办法将其正确读入 pandas df?

最佳答案

您可以使用 csv 模块编写自己的解析器,然后为 pandas 构造生成器,例如:

代码:

import csv
import pandas as pd

def read_my_csv(file_handle):
# build csv reader
reader = csv.reader(file_handle)

# get and yield the header
header = next(reader)
yield header

# for each row, get enough data and then yield the row
for row in reader:
while len(row) < len(header):
row += next(reader)
yield row

with open('file1', 'rU') as f:
generator = read_my_csv(f)
columns = next(generator)
df = pd.DataFrame(generator, columns=columns)

print(df)

结果:

  Id   Name Shape Library Page Name Line Connection Start Line Connection End  \
0 1 Page 0:Page 1
1 2 Table Tables 0:Page 1
2 3 easy Tables 0:Page 1
3 4 easy Tables 0:Page 1

Text Area 1 Text Area 2 \
0
1 Openingsuren gemeentehuis Action
2 Openignsuren andere dag Action
3 Openingsuren

Text Area 3 \
0
1 Is het gemeentehuis open?\nWat zijn de opening...
2 En morgen?
3

Text Area 4
0
1 webhook\nDe webserver staat niet op denk ik, g...
2
3

关于python - 将 csv 读入 pandas df,由此它的行可能被分成多行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44485267/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com