gpt4 book ai didi

python - 使用 Pandas 比较具有不同列数的大型 CSV 文件

转载 作者:行者123 更新时间:2023-11-28 18:19:18 25 4
gpt4 key购买 nike

我是 python 编程的新手,我正在尝试连接两个具有不同列数的 csv 文件。目的是找到丢失的记录并创建包含主列中特定列的报告。

excel直接复制的两个csv文件示例样本 CSV 1(combine201709.csv)

start_time  end_time    aitechid    hh_village  grpdetails1/farmername  grpdetails1/farmermobile
2016-11-26T14:01:47.329+03 2016-11-26T14:29:05.042+03 AI00001 2447 KahsuGebru 919115604
2016-11-26T19:34:42.159+03 2016-11-26T20:39:27.430+03 936891238 2473 Moto Aleka 914370833
2016-11-26T12:13:23.094+03 2016-11-26T14:25:19.178+03 914127382 2390 Hagos 914039654
2016-11-30T14:31:28.223+03 2016-11-30T14:56:33.144+03 920784222

样本 CSV 2(组合缺失记录.csv)

farmermobile
941807851
946741296
9
920212218
915
939555303
961579437
919961811
100004123
972635273
918166831
961579437
922882638
100006273
919728710
30000739
920770648
100004727
963767487
915855665
932255143
923531603
0
931875236
918027506
8
916353266
918020303
924359729
934623027
916585963
960791618
988047183
100002632
300007241
918271897
300007238
918250712

我试过了,但无法获得预期的输出:

    import pandas as pd

normalize = lambda x: "%.4f" % float(x) # round
df = pd.read_csv("/media/dmogaka/DATA/week progress/week4/combine201709.csv", index_col=(0,1), usecols=(1, 2, 3,4),
header=None, converters=dict.fromkeys([1,2]))
df2 = pd.read_csv("/media/dmogaka/DATA/week progress/week4/combinedmissingrecords.csv", index_col=(0,1), usecols=(0),
header=None, converters=dict.fromkeys([1,2]))
result = df2.merge(df[['aitechid','grpdetails1/farmermobile','grpdetails1/farmername']],
left_on='farmermobile', right_on='grpdetails1/farmermobile')
result.to_csv("/media/dmogaka/DATA/week progress/week4/output.csv", header=None) # write as csv

错误信息

/usr/bin/python3.5 "/media/dmogaka/DATA/Panda tut/test/test.py"
Traceback (most recent call last):
File "/media/dmogaka/DATA/Panda tut/test/test.py", line 7, in <module>
header=None, converters=dict.fromkeys([1,2]))
File "/home/dmogaka/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/dmogaka/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 405, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/dmogaka/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 764, in __init__
self._make_engine(self.engine)
File "/home/dmogaka/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 985, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/dmogaka/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1605, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 461, in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:4968)
TypeError: 'int' object is not iterable

Process finished with exit code 1

最佳答案

试试这个:

d2.merge(d1[['aitechid','grpdetails1/farmermobile','grpdetails1/farmername']], 
left_on='farmermobile', right_on='grpdetails1/farmermobile')

d2.merge(d1[['aitechid','grpdetails1/farmermobile','grpdetails1/farmername']] \
.rename(columns={'grpdetails1/farmermobile':'farmermobile'}))

关于python - 使用 Pandas 比较具有不同列数的大型 CSV 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46257988/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com