gpt4 book ai didi

python - 使用 Python 合并两个 CSV 文件

转载 作者:IT老高 更新时间:2023-10-28 20:49:32 26 4
gpt4 key购买 nike

好的,我已经阅读了 Stack Overflow 上的几个主题。我认为这对我来说相当容易,但我发现我仍然没有很好地掌握 Python。我尝试了位于 How to combine 2 csv files with common column value, but both files have different number of lines 的示例,这很有帮助,但我仍然没有得到我希望达到的结果。

基本上我有 2 个 csv 文件,它们有一个共同的第一列。我想合并2。即

filea.csv

title,stage,jan,febdarn,3.001,0.421,0.532ok,2.829,1.036,0.751three,1.115,1.146,2.921

fileb.csv

title,mar,apr,may,jun,darn,0.631,1.321,0.951,1.751ok,1.001,0.247,2.456,0.3216three,0.285,1.283,0.924,956

output.csv (not the one I am getting but what I want)

title,stage,jan,feb,mar,apr,may,jundarn,3.001,0.421,0.532,0.631,1.321,0.951,1.751ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216three,1.115,1.146,2.921,0.285,1.283,0.924,956

output.csv (the output that I actually got)

title,feb,mayok,0.751,2.456three,2.921,0.924darn,0.532,0.951

The code I was trying:

'''
testing merging of 2 csv files
'''
import csv
import array
import os

with open('Z:\\Desktop\\test\\filea.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {row[0]: row[3] for row in r}

with open('Z:\\Desktop\\test\\fileb.csv') as f:
r = csv.reader(f, delimiter=',')
#dict2 = {row[0]: row[3] for row in r}
dict2 = {row[0:3] for row in r}

print str(dict1)
print str(dict2)

keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

非常感谢任何帮助。

最佳答案

当我处理 csv 文件时,我经常使用 pandas 库。它使这样的事情变得非常容易。例如:

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)

下面是一些解释。首先,我们读入 csv 文件:

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
title stage jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2 three 1.115 1.146 2.921
>>> b
title mar apr may jun Unnamed: 5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2 three 0.285 1.283 0.924 956.0000 NaN

我们看到有一个额外的数据列(注意第一行 fileb.csv -- title,mar,apr,may,jun, --末尾有一个额外的逗号)。我们可以很容易地摆脱它:

>>> b = b.dropna(axis=1)
>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2 three 0.285 1.283 0.924 956.0000

现在我们可以在标题列合并ab:

>>> merged = a.merge(b, on='title')
>>> merged
title stage jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000

最后写出来:

>>> merged.to_csv("output.csv", index=False)

制作:

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0

关于python - 使用 Python 合并两个 CSV 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16265831/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com