gpt4 book ai didi

python - Pandas read_csv() : keep 0 as 0 (not convert it to NaN)

转载 作者:太空宇宙 更新时间:2023-11-04 09:11:19 26 4
gpt4 key购买 nike

我正在尝试读取一个 csv 文件,其中的一个示例:

datetime,check,lat,lon,co_alpha,atn,status,bc
2012-10-27 15:00:59,2,0,0,2.427,,,
2012-10-27 15:01:00,2,0,0,2.407,,,
2012-10-27 15:02:49,2,0,0,2.207,-17.358,0,-16162
2012-10-27 15:02:50,2,0,0,2.207,-17.354,0,8192
2012-10-27 15:02:51,1,0,0,2.207,-17.358,0,-8152
2012-10-27 15:02:52,1,0,0,2.207,-17.358,0,648
2012-10-27 15:06:03,0,51.195076,4.444407,2.349,-17.289,0,4909
2012-10-27 15:06:04,0,51.195182,4.44427,2.344,-17.289,0,587
2012-12-05 09:21:34,,,,,42.960,1,16430
2012-12-05 09:21:35,,,,,42.962,1,3597

我遇到的问题是,在只有整数的列中,0 被转换为 NaN(例如列“检查”和“状态”,这些是只有整数的列,但该列被读取为 float ,因为有真正的缺失值)。但我只想将空值转换为 NaN,而不是零。

这是我得到的:

>>> pd.read_clipboard(sep=',', parse_dates=True, index_col=0)
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 NaN -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 NaN 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 NaN -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 NaN 648
2012-10-27 15:06:03 NaN 51.195076 4.444407 2.349 -17.289 NaN 4909
2012-10-27 15:06:04 NaN 51.195182 4.444270 2.344 -17.289 NaN 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597

因此,在“检查”和“状态”列中,有很多 NaN。在“lat”和“lon”列中,0 不会转换为 NaN。

  • 使用 na_values=''keep_default_na=False 没有帮助。有没有办法指定不将 int 0 转换为 NaN?或者这是一个错误?

  • 我可以使用 dtype 关键字将特定列的数据类型指定为 int。这使 0 保持为 0,但问题是这些列还包含真正的 NaN(空值)。因此,在这种情况下,这些值也被转换为 0,因为在 int 列中不能有 NaN。出于这个原因,我必须将所有列都保留为 float 。


编辑:升级到 pandas 0.10.1 后,即使没有指定 keep_default_nana_values,它也能按预期工作:

>>> pd.read_clipboard(sep=',', parse_dates=True, index_col=0)
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597

最佳答案

您必须先将 keep_default_na 设置为 False:

df = pd.read_clipboard(sep=',', index_col=0, keep_default_na=False, na_values='')

In [2]: df
Out[2]:
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597

来自 read_tables 的文档字符串:

keep_default_na : bool, default True
     If na_values are specified and keep_default_na is False the default NaN
    values are overridden, otherwise they're appended to

na_values : list-like or dict, default None
    Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values

关于python - Pandas read_csv() : keep 0 as 0 (not convert it to NaN),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14727469/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com