gpt4 book ai didi

python - 使用 python 基于公共(public)字段合并多个 *.csv、*.txt 或 *.ascii 文件

转载 作者:太空宇宙 更新时间:2023-11-03 14:32:55 25 4
gpt4 key购买 nike

我想将大约 8 个 *.csv 文件合并为一个。

示例文件:

ID, Average
34, 4.5
35, 5.6
36, 3.4

另一个文件可能是:

ID, Max
34, 6
35, 7
36, 4

我需要的输出是:

ID, Average, Max
34, 4.5, 6
35, 5.6, 7
36, 3.4, 4

这只成功了一半......它将所有数据附加到相同的两列中。

import glob, string

outfile = open('<directory>/<fileName>.csv','a')
files = glob.glob(r"<directory>/*.csv")

for y in files:
newfile = open(y,'r+')
data = newfile.read()
newfile.close()
outfile.writerow(y)

如何将数据附加到新列,而不是重复“ID”字段?

最佳答案

你在这里遇到了三个问题。

  1. 读入每个csv文件
  2. 在公共(public)领域合并
  3. 将合并后的数据写入一个新的csv文件

代码

#!/usr/bin/env python
import argparse, csv
if __name__ == '__main__':

parser = argparse.ArgumentParser(description='merge csv files on field', version='%(prog)s 1.0')
parser.add_argument('infile', nargs='+', type=str, help='list of input files')
parser.add_argument('--out', type=str, default='temp.csv', help='name of output file')
args = parser.parse_args()
data = {}
fields = []

for fname in args.infile:
with open(fname, 'rb') as df:
reader = csv.DictReader(df)
for line in reader:
# assuming the field is called ID
if line['ID'] not in data:
data[line['ID']] = line
else:
for k,v in line.iteritems():
if k not in data[line['ID']]:
data[line['ID']][k] = v
for k in line.iterkeys():
if k not in fields:
fields.append(k)
del reader

writer = csv.DictWriter(open(args.out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data)
del writer

请注意,这将忽略具有相同字段名称的数据。

解析器部分的替代方法是:

#!/usr/bin/env python
import glob, csv
if __name__ == '__main__':

infiles = glob.glob('./*.csv')
out = 'temp.csv'
data = {}
fields = []

for fname in infiles:
df = open(fname, 'rb')
reader = csv.DictReader(df)
for line in reader:
# assuming the field is called ID
if line['ID'] not in data:
data[line['ID']] = line
else:
for k,v in line.iteritems():
if k not in data[line['ID']]:
data[line['ID']][k] = v
for k in line.iterkeys():
if k not in fields:
fields.append(k)
del reader
df.close()

writer = csv.DictWriter(open(out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data)
del writer

关于python - 使用 python 基于公共(public)字段合并多个 *.csv、*.txt 或 *.ascii 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7519412/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com