gpt4 book ai didi

python - Pandas DataFrame 无法读取数据

转载 作者:太空宇宙 更新时间:2023-11-03 16:43:55 27 4
gpt4 key购买 nike

我遇到了几个月前还没有的 pandas 问题。我试图从用户输入中获取一组数据(使用 tkinter)并将其放入 pandas 数据框中。数据如下所示:

1.000000    03/27/2016   13:29:26.098   1431.778943 0.092089
1.000000 03/27/2016 13:29:26.298 1432.410517 0.078570
1.000000 03/27/2016 13:29:26.498 1431.905258 0.089538
1.000000 03/27/2016 13:29:26.698 1431.399999 0.080930
5.000000 03/28/2016 00:00:00.098 1289.422164 0.392945
25.000000 03/28/2016 00:00:00.298 1289.295849 0.145016
25.000000 03/28/2016 00:00:00.498 1289.295849 0.183149
25.000000 03/28/2016 00:00:00.698 1288.790590 0.175114
26.000000 03/28/2016 00:25:16.698 1302.053644 0.162170
.....

设置了 5 列,但数据集中通常有 200,000 到 800,000 行。

这是我的代码:

import pandas as pd
import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename() #User selects file

file = pd.read_table(file_path, index_col=False)
df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)

print(file_path)
print(file)
print(df)

print(file_path) 输出正确的路径,print(file) 显示所有正确的数据,print(df) 显示以下内容:

 Measurement Date Time  CO2 Flow
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
.......

我以前做过完全相同的事情,但我丢失了正在编写的脚本,需要重新开始。以前工作得很好,但我不确定发生了什么。我尝试了几种方法来修复它:

  1. 将 pd.read_table 更改为 pd.io.parsers.read_table
  2. 更改了 pd.DataFrame 的 index=、dtype= 和其他属性
  3. 将文件转换为 .csv 并使用 pd.read_csv
  4. 显着缩短文件
  5. 创建一个具有单列的 pd.Series 并打印,但所有数据点仍然为 NaN

我可以轻松生成一组随机数据并将其放入 pd.DataFrame 中,没有任何问题(我使用 df2 = DataFrame(np.random.randn(10, 5)columns=['a', 'b' , 'c', 'd', 'e']) 在 ipython 中并且显示正确)。

我用相同的数据创建了一个 numpy 数组,它工作得很好。我想使用 pandas,因为我认为从长远来看,我的分析会更容易。我真的希望我错过了一些小事,但我已经为此工作了一段时间,所以我愿意尝试任何事情。

最佳答案

引用read_table的文档,您已经在文件中获取了 DataFrame。

试试这个:

In [71]: f = pd.read_table('table.txt', names=['Measurement', 'Date', 'Time','CO2', 'Flow'])

In [72]: f
Out[72]:
Measurement Date Time CO2 Flow
0 1 03/27/2016 13:29:26.098 1431.778943 0.092089
1 1 03/27/2016 13:29:26.298 1432.410517 0.078570
2 1 03/27/2016 13:29:26.498 1431.905258 0.089538
3 1 03/27/2016 13:29:26.698 1431.399999 0.080930
4 5 03/28/2016 00:00:00.098 1289.422164 0.392945
5 25 03/28/2016 00:00:00.298 1289.295849 0.145016
6 25 03/28/2016 00:00:00.498 1289.295849 0.183149
7 25 03/28/2016 00:00:00.698 1288.790590 0.175114
8 26 03/28/2016 00:25:16.698 1302.053644 0.162170

那么为什么你没有得到想要的结果呢?观察表被读取后,它没有所需的列名。

In [77]: file = pd.read_table('table.txt', index_col=False)

In [78]: file
Out[78]:
1.000000 03/27/2016 13:29:26.098 1431.778943 0.092089
0 1 03/27/2016 13:29:26.298 1432.410517 0.078570
1 1 03/27/2016 13:29:26.498 1431.905258 0.089538
2 1 03/27/2016 13:29:26.698 1431.399999 0.080930
3 5 03/28/2016 00:00:00.098 1289.422164 0.392945
4 25 03/28/2016 00:00:00.298 1289.295849 0.145016
5 25 03/28/2016 00:00:00.498 1289.295849 0.183149
6 25 03/28/2016 00:00:00.698 1288.790590 0.175114
7 26 03/28/2016 00:25:16.698 1302.053644 0.162170

因此,当您使用现有 DataFrame 和列名称调用 DataFrame 构造函数时,您将得到所有空值,因为输入 DataFrame 中给出的名称不存在列。

In [80]: df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)

In [81]: df
Out[81]:
Measurement Date Time CO2 Flow
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN

关于python - Pandas DataFrame 无法读取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36520217/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com