gpt4 book ai didi

csv 文件行中每列的 Python 唯一值

转载 作者:行者123 更新时间:2023-12-01 03:41:45 25 4
gpt4 key购买 nike

研究这个问题很久了。有没有一种简单的方法使用 Numpy 或 Pandas 或修复我的代码来获取由“|”分隔的行中的列的唯一值

即数据:

"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"

输出应该是:

"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"

我损坏的代码:

import csv
import pprint

your_list = csv.reader(open('out.csv'))
your_list = list(your_list)

#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
i=0
for col in line:
if i==cols_no:
print "\n"
i=0
if string in col:
values = col.split("|")
myset = set(values)
items = list()
for item in myset:
items.append(item)
print items
else:
print col+",",
i=i+1

它输出:

id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,

提前致谢!

最佳答案

numpy/pandas 对于使用 csv.DictReadercsv.DictWriter 所能实现的效果来说有点过分了> 带有 collections.OrderedDict,例如:

import csv
from collections import OrderedDict

# If using Python 2.x - use `open('output.csv', 'wb') instead
with open('input.csv') as fin, open('output.csv', 'w') as fout:
csvin = csv.DictReader(fin)
csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL)
csvout.writeheader()
for row in csvin:
for k, v in row.items():
row[k] = '|'.join(OrderedDict.fromkeys(v.split('|')))
csvout.writerow(row)

给你:

"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"

关于csv 文件行中每列的 Python 唯一值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39509824/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com