gpt4 book ai didi

python - 如何根据字段合并两个 CSV 文件并在每条记录上保持相同数量的属性?

转载 作者:太空狗 更新时间:2023-10-29 22:13:19 25 4
gpt4 key购买 nike

我正在尝试根据每个文件中的特定字段合并两个 CSV 文件。

file1.csv

id,attr1,attr2,attr3
1,True,7,"Purple"
2,False,19.8,"Cucumber"
3,False,-0.5,"A string with a comma, because it has one"
4,True,2,"Nope"
5,True,4.0,"Tuesday"
6,False,1,"Failure"

file2.csv

id,attr4,attr5,attr6
2,"python",500000.12,False
5,"program",3,True
3,"Another string",-5,False

这是我使用的代码:

import csv
from collections import OrderedDict

with open('file2.csv','r') as f2:
reader = csv.reader(f2)
fields2 = next(reader,None) # Skip headers
dict2 = {row[0]: row[1:] for row in reader}

with open('file1.csv','r') as f1:
reader = csv.reader(f1)
fields1 = next(reader,None) # Skip headers
dict1 = OrderedDict((row[0], row[1:]) for row in reader)

result = OrderedDict()
for d in (dict1, dict2):
for key, value in d.iteritems():
result.setdefault(key, []).extend(value)

with open('merged.csv', 'wb') as f:
w = csv.writer(f)
for key, value in result.iteritems():
w.writerow([key] + value)

我得到这样的输出,它适本地合并,但所有行的属性数量不同:

1,True,7,Purple
2,False,19.8,Cucumber,python,500000.12,False
3,False,-0.5,"A string with a comma, because it has one",Another string,-5,False
4,True,2,Nope
5,True,4.0,Tuesday,program,3,True
6,False,1,Failure

file2 不会对 file1 中的每个 id 都有记录。我希望合并文件中的 file2 中的输出具有空字段。例如,id 1 看起来像这样:

1,True,7,Purple,,,

如何将空字段添加到 file2 中没有数据的记录,以便合并后的 CSV 中的所有记录都具有相同数量的属性?

最佳答案

如果我们不使用 pandas,我会重构为类似的东西

import csv
from collections import OrderedDict

filenames = "file1.csv", "file2.csv"
data = OrderedDict()
fieldnames = []
for filename in filenames:
with open(filename, "rb") as fp: # python 2
reader = csv.DictReader(fp)
fieldnames.extend(reader.fieldnames)
for row in reader:
data.setdefault(row["id"], {}).update(row)

fieldnames = list(OrderedDict.fromkeys(fieldnames))
with open("merged.csv", "wb") as fp:
writer = csv.writer(fp)
writer.writerow(fieldnames)
for row in data.itervalues():
writer.writerow([row.get(field, '') for field in fieldnames])

给出

id,attr1,attr2,attr3,attr4,attr5,attr6
1,True,7,Purple,,,
2,False,19.8,Cucumber,python,500000.12,False
3,False,-0.5,"A string with a comma, because it has one",Another string,-5,False
4,True,2,Nope,,,
5,True,4.0,Tuesday,program,3,True
6,False,1,Failure,,,

为了比较,pandas 的等价物是这样的

df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")
merged = df1.merge(df2, on="id", how="outer").fillna("")
merged.to_csv("merged.csv", index=False)

这在我看来要简单得多,意味着您可以花更多时间处理数据,减少重新发明轮子的时间。

关于python - 如何根据字段合并两个 CSV 文件并在每条记录上保持相同数量的属性?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23343919/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com