gpt4 book ai didi

Python pandas 数据清理

转载 作者:太空宇宙 更新时间:2023-11-03 21:39:09 24 4
gpt4 key购买 nike

我是 python pandas 的新手,我很难实现以下数据清理,请帮忙。

我的实际数据(csv 文件链接 - https://s3.amazonaws.com/rajaampledata/data.csv )

Date,Description,Description,Ref. No,Amount,Balance
30/08/2012,TFR-TFR:0000000101-,,,"1,952.50-","4,000.000"
"",Kumar - S/O To:,,,,
"",600010013441,,,,
30/08/2012,FDR-,,,10.50-,"5,114,897.40"
"",AU;541411;301218;RAJA,,,,
"",J;RTGS-AUTO-,,,,
"",TRANSAC,,,,
26/08/2012,DEP-IN162071/D61519,,,"1,000.83","6,100,098.32"
26/08/2012,WDL-IN B CM 20120826,,,180.32-,"789,126.31"
25/08/2012,103-,,,"1,000,000.00","3,225,700.00"
"",IN;112138;100318;BANK,,,,
"",ACC;,,,,

我想获取如下数据

30/08/2012,TFR-TFR:0000000101-Kumar - S/O To:600010013441,,,"1,952.50","4,000.000"
30/08/2012,FDR-AU;541411;301218;RAJAJ;RTGS-AUTO-TRANSAC,,,10.50-,"5,114,897.40"
26/08/2012,DEP-IN162071/D61519,,,"1,000.83","6,100,098.32"
26/08/2012,WDL-IN B CM 20120826,,,180.32-,"789,126.31"
25/08/2012,103-IN;112138;100318;BANKACC;,,,"1,000,000.00","3,225,700.00"

最佳答案

如果当前行以空格开头,请尝试附加到上一行。获得数据后,使用逗号分隔符将它们连接到一个字符串中。

with open('data.csv') as f:
reader = csv.reader(f)
headers = next(reader)
lines = []
for r in reader:
if r[0] == '':
lines[-1][1] = lines[-1][1] + r[1]
else:
lines.append(r)

lines = [','.join(i) for i in lines]

print(lines)
>>['30/08/2012,TFR-TFR:0000000101-Kumar - S/O To:6.0001E+11,,,1,952.50-,4,000.00',
'30/08/2012,FDR-AU;541411;301218;RAJAJ;RTGS-AUTO-TRANSAC,,,10.50-,5,114,897.40',
'26/08/2012,DEP-IN162071/D61519,,,1,000.83,6,100,098.32',
'26/08/2012,WDL-IN B CM 20120826,,,180.32-,789,126.31',
'25/08/2012,103-IN;112138;100318;BANKACC;,,,1,000,000.00,3,225,700.00']

如果您想要标题,请读取 csv 的第一行。

关于Python pandas 数据清理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53014674/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com