gpt4 book ai didi

python - 使用 Python 解析文本文件

转载 作者:太空狗 更新时间:2023-10-29 17:13:47 27 4
gpt4 key购买 nike

我正在尝试解析一系列文本文件并使用 Python (2.7.3) 将它们保存为 CSV 文件。所有文本文件都有一个 4 行长的标题,需要将其删除。数据行有各种分隔符,包括“(引号)、-(破折号)、: 列和空格。我发现使用所有这些不同的分隔符在 C++ 中编写代码很痛苦,所以我决定在 Python 中试一试与 C/C++ 相比,相对容易实现。

我写了一段代码来测试它是否适用于单行数据并且它可以工作,但是,我无法设法让它适用于实际文件。为了解析一行,我使用了文本对象和“替换”方法。看起来我当前的实现将文本文件作为列表读取,并且列表对象没有替换方法。

作为 Python 的新手,我被困在了这一点上。任何输入将不胜感激!

谢谢!

# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
text = text.replace(i,j)
return text

# open input/output files

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines


# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

txt = data_parser(my_text, reps)
outputfile.writelines(txt)

inputfile.close()
outputfile.close()

最佳答案

我会使用 for 循环遍历文本文件中的行:

for line in my_text:
outputfile.writelines(data_parser(line, reps))

如果你想逐行读取文件而不是在脚本开头加载整个文件,你可以这样做:

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
outputfile.writelines(data_parser(line, reps))

inputfile.close()
outputfile.close()

关于python - 使用 Python 解析文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11936967/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com