gpt4 book ai didi

python - 根据第一列中的日期拆分大型 csv 文件 Python 3.4.3

转载 作者:太空宇宙 更新时间:2023-11-03 15:44:39 27 4
gpt4 key购买 nike

好的,所以我在以下链接中找到了我需要的部分答案,只要我的 csv 文件位于 2015-03-01,1,2,3,1, 3 第一列的 格式。当第一列更改为 2015-03-01 00:00:00.000

时,如何保持此功能正常工作

How to split a huge csv file based on content of first column?

import csv
from itertools import groupby

for key, rows in groupby(csv.reader(open("largeFile.csv", "r", encoding='utf-16')),
lambda row: row[0]):
with open("%s.txt" % key, "w") as output:
for row in rows:
output.write(",".join(row) + "\n")

所以我有一个大文件,其中大约有 170 万行...

2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.03,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.03,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

该程序确实每天都会创建一个新的文本文档,这太棒了!

但是当列如下时,它就停止工作了。

2015-03-01 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-01 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-02 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-02 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-02 00:00:03.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-03 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

2015-03-03 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1

它给了我以下错误。

Traceback (most recent call last): File "C:\Python34\Proj\documents\New folder\dataPullSplit2.py", line 6, in with open("%s.txt" % key, "w") as output: OSError: [Errno 22] Invalid argument: '2015-03-01 00:00:00.000.txt'

有人可以给我指出正确的方向吗?

Found Temp Solution

好的,所以通过将其从“w”更改为“a”,我现在将其附加到文件中,并使用key[:-13]我能够切断时间戳文件名...它可以工作...但是速度很慢...我该如何改进并理解为什么它运行得这么慢?

这是现在的代码

import csv
from itertools import groupby

for key, rows in groupby(csv.reader(open("asdf2.txt", "r", encoding='utf-16')),
lambda row: row[0]):

with open("%s.txt" % key[:-13], "a") as output:
for row in rows:
output.write(",".join(row) + "\n")

最佳答案

假设您的文件应保留模式 2015.01.01,清理 key 应该可以:

key = key.split()[0].replace('-', '.')

完整代码:

import csv
from itertools import groupby


def shorten_key(key):
return key.split()[0].replace('-', '.')


for key, rows in groupby(csv.reader(open("asdf2.txt", "r", encoding='utf-16')),
lambda row: shorten_key(row[0])):

with open("%s.txt" % shorten_key(key), "a") as output:
for row in rows:
output.write(",".join(row) + "\n")

快速测试:

keys = ['2015-03-01 00:00:02.000',  '2015.01.01']

for key in keys:
print(key.split()[0].replace('-', '.'))

输出:

2015.03.01
2015.01.01

关于python - 根据第一列中的日期拆分大型 csv 文件 Python 3.4.3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41864058/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com