gpt4 book ai didi

python - 使用 python 根据公共(public)字段合并 2 个 csv 文件

转载 作者:行者123 更新时间:2023-11-30 23:38:51 26 4
gpt4 key购买 nike

我从 2 个 mysql 表生成了 2 个 csv 文件。现在我想将这两个文件合并在一起。

我为第一个 csv 手动添加了此 header :

ID,name,sector,sub_sector

这是第二个 csv header :

ID,url

我的目标是拥有 1 个文件:

ID,name,sector,sub_sector,url

注意:第一个文件中的所有记录与第二个文件中的记录并不匹配。

这是我正在使用的代码片段:

#!/usr/bin/env python
import glob, csv
if __name__ == '__main__':

infiles = glob.glob('./*.csv')
out = 'temp.csv'
data = {}
fields = []

for fname in infiles:
df = open(fname, 'rb')
reader = csv.DictReader(df)
for line in reader:
# assuming the field is called ID
if line['ID'] not in data:
data[line['ID']] = line
else:
for k,v in line.iteritems():
if k not in data[line['ID']]:
data[line['ID']][k] = v
for k in line.iterkeys():
if k not in fields:
fields.append(k)
del reader
df.close()

writer = csv.DictWriter(open(out, "wb"), fields, extrasaction='ignore', dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data)
del writer

取自另一个软线程。这是我收到的错误:

  File "db_work.py", line 30, in <module>
writer.writerows(data)
File "/usr/lib/python2.7/csv.py", line 153, in writerows
rows.append(self._dict_to_list(rowdict))
File "/usr/lib/python2.7/csv.py", line 144, in _dict_to_list
", ".join(wrong_fields))
ValueError: dict contains fields not in fieldnames: 4, 4, 4, 6
~/Development/python/DB$ python db_work.py
Traceback (most recent call last):
File "db_work.py", line 30, in <module>
writer.writerows(data)
File "/usr/lib/python2.7/csv.py", line 153, in writerows
rows.append(self._dict_to_list(rowdict))
File "/usr/lib/python2.7/csv.py", line 145, in _dict_to_list
return [rowdict.get(key, self.restval) for key in self.fieldnames]
AttributeError: 'str' object has no attribute 'get'

有什么想法可以解决这个问题吗?

最佳答案

.writerows()需要一个列表,但您传递的是 dict反而。我想你想写 data 的值仅:

writer = csv.DictWriter(open(out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data.values())

就我个人而言,我会仅使用 id, url 来读取该文件。行,将它们添加到字典中,然后读取另一个文件并通过添加相应的 url 一次写入每一行条目。

import csv

with open('urls.csv', 'rb') as urls:
reader = csv.reader(urls)
reader.next() # skip the header, won't need that here
urls = {id: url for id, url in reader}

with open('other.csv', 'rb') as other:
with open(out, 'wb') as output:
reader = csv.reader(other)
writer = csv.writer(output)
writer.writerow(reader.next() + ['url']) # read old header, add urls and write out
for row in reader:
# write out original row plus url if we can find one
writer.writerow(row + [urls.get(row[0], '')])

关于python - 使用 python 根据公共(public)字段合并 2 个 csv 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14090424/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com