gpt4 book ai didi

python - 使用 python 清理凌乱的 CSV,保存在 Excel 中

转载 作者:太空宇宙 更新时间:2023-11-03 14:12:29 24 4
gpt4 key购买 nike

我对使用 python 还很陌生,我正在尝试将 CSV 文件的负载(100 个)读取到一个数据帧中。然而,csv 文件相当困惑,使用多个分隔符等。我尝试搜索这个网站,但我发现的所有东西都不起作用。我尝试过 readlines 和 pd.read 有很多选项,但我得到的只是错误或空数据帧。当我在 Excel 中打开 CSV 时,它看起来很好,当我将其保存为 UTF-8 csv 时,一切正常。然而,为每个 Excel 文件执行此操作的工作量很大,即使使用宏也是如此。有没有办法使用 python 代码(例如 in2csv)来复制此过程?下面我提供了我需要使用的 csv 文件的一部分,以及来自 excel 的 csv 文件的一部分(有效)。对我来说,主要区别在于空格和逗号分隔符,但在 pd.read 中更改它并没有帮助。非常感谢!

凌乱的 csv:

"Device name:UU-WGB-JV_1  Device type:SUN2000  Device address:IP Address=62.72.193.88   Device No.=2  Date:2018-01-23 08:51:23  "   
"Generated On" "Device Status" "Energy Yield of Current Day (kWh)" "Inv. efficiency"(%) "Total Energy Yield (kWh)" "Input Power (kW)" "Active Power (kW)" "Reactive Power (kVar)" "Power Factor" "Grid Frequency (Hz)" "Grid A Current (A)" "Grid B Current (A)" "Grid C Current (A)" "Grid A Phase Voltage (V)" "Grid B Phase Voltage (V)" "Grid C Phase Voltage (V)" "PV1 Input Current (A)" "PV2 Input Current (A)" "PV3 Input Current (A)" "PV4 Input Current (A)" "PV5 Input Current (A)" "PV6 Input Current (A)" "PV1 Input Voltage (V)" "PV2 Input Voltage (V)" "PV3 Input Voltage (V)" "PV4 Input Voltage (V)" "PV5 Input Voltage (V)" "PV6 Input Voltage (V)" "Cabinet Temperature (℃)"
"2017-12-22 00:00:00 " "Idle: No irradiation" "0.00" "0.00" "45803.34" "0.000" "0.000" "0.000" "0.000" "0.00" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0"
"2017-12-22 00:15:00 " "Idle: No irradiation" "0.00" "0.00" "45803.34" "0.000" "0.000" "0.000" "0.000" "0.00" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" "0.0"

良好的 CSV:

Device name:UU-CB_1  Device type:SUN2000  Device address:IP Address=62.140.137.136   Device No.=1  Date:2018-01-22 13:31:51  ,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Generated On,Device Status,Energy Yield of Current Day (kWh),Inv. efficiency(%),Total Energy Yield (kWh),Input Power (kW),Active Power (kW),Reactive Power (kVar),Power Factor,Grid Frequency (Hz),Grid A Current (A),Grid B Current (A),Grid C Current (A),Grid A Phase Voltage (V),Grid B Phase Voltage (V),Grid C Phase Voltage (V),PV1 Input Current (A),PV2 Input Current (A),PV3 Input Current (A),PV4 Input Current (A),PV5 Input Current (A),PV6 Input Current (A),PV1 Input Voltage (V),PV2 Input Voltage (V),PV3 Input Voltage (V),PV4 Input Voltage (V),PV5 Input Voltage (V),PV6 Input Voltage (V),Cabinet Temperature (℃)
"2017-11-01 00:00:00 ",Idle: No irradiation,0,-,36670.07,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

最佳答案

似乎第一行(标题)无法挽救,因为它包含空格和未加引号的字段。可以通过特定的正则表达式来修复。我会跳过它。

其余行不是 csv,但包含用空格分隔的引号标记,对于 shlex.split 来说轻而易举:

import shlex,csv

with open("input.csv") as f:
title = next(f) # discard title line

with open("output.csv","w",newline="",encoding="utf-8") as fw:
cw = csv.writer(fw,delimiter=";") # may be changed to ","
cw.writerows(shlex.split(l) for l in f)

输出:

Generated On;Device Status;Energy Yield of Current Day (kWh);Inv. efficiency(%);Total Energy Yield (kWh);Input Power (kW);Active Power (kW);Reactive Power (kVar);Power Factor;Grid Frequency (Hz);Grid A Current (A);Grid B Current (A);Grid C Current (A);Grid A Phase Voltage (V);Grid B Phase Voltage (V);Grid C Phase Voltage (V);PV1 Input Current (A);PV2 Input Current (A);PV3 Input Current (A);PV4 Input Current (A);PV5 Input Current (A);PV6 Input Current (A);PV1 Input Voltage (V);PV2 Input Voltage (V);PV3 Input Voltage (V);PV4 Input Voltage (V);PV5 Input Voltage (V);PV6 Input Voltage (V);Cabinet Temperature (℃)
2017-12-22 00:00:00 ;Idle: No irradiation;0.00;0.00;45803.34;0.000;0.000;0.000;0.000;0.00;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0
2017-12-22 00:15:00 ;Idle: No irradiation;0.00;0.00;45803.34;0.000;0.000;0.000;0.000;0.00;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0;0.0

文件现在可以在 Excel 中正确打开(请注意,各个版本的 Excel 默认情况下需要逗号或分号分隔符)

enter image description here

关于python - 使用 python 清理凌乱的 CSV,保存在 Excel 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48404294/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com