gpt4 book ai didi

python - 使用 pandas.read_csv 的 na_values 正则表达式

转载 作者:行者123 更新时间:2023-12-04 22:05:51 32 4
gpt4 key购买 nike

我想使用 pandas.read_csv 读取这样的文件

1891, 91920,  7,       628,249, 59,51.0, 0.026, 0.028,   NaN,   NaN,   NaN,   NaN,   NaN,  0.156, 0.071,    NaN,   NaN,    NaN,   NaN,    NaN,   NaN,    NaN,   NaN,   21,500,   21,43.8, 0.005, 0.619,  NaN,45.6, 0.048, 0.053,   NaN,   NaN,   NaN,   NaN,   NaN, -0.180, 0.088,   20, 0.012, 1.107,  NaN, NaN,   NaN,   NaN,   NaN,   NaN,   NaN,   NaN,   NaN,    NaN,   NaN,  NaN,   NaN,   NaN,  NaN,     NaN,     NaN,     NaN
1891, 91920, 16, 628,135, 22,41.2, 0.093, 0.087, NaN, NaN, NaN, NaN, NaN, 0.416, 0.212, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,23.3, 0.021, 2.023, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN
1891, 91920, 3, 628, 28, 39,47.0, 0.041, 0.044, NaN, NaN, NaN, NaN, NaN, -0.006, 0.064, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,37.5, 0.009, 0.964, NaN,45.3, 0.054, 0.055, NaN, NaN, NaN, NaN, NaN, -0.838, 0.228, 20, 0.013, 1.193, NaN,51.8, 0.025, 0.026, NaN, NaN, NaN, NaN, NaN, -0.021, 0.054, 21, 0.005, 0.540, NaN, NaN, NaN, NaN
1891, 91920, 6, 628,276, 20,40.0, 0.118, 0.101, NaN, NaN, NaN, NaN, NaN, -0.767, 0.558, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,26.7, 0.032, 2.982, NaN,41.0, 0.088, 0.089, NaN, NaN, NaN, NaN, NaN, -0.141, 0.233, 20, 0.024, 2.074, NaN,46.2, 0.053, 0.049, NaN, NaN, NaN, NaN, NaN, 0.080, 0.034, 21, 0.012, 1.187, NaN, NaN, NaN, NaN

我在尝试读取它时遇到问题,因为 NaN 值。如果文件是 csv 文件(逗号分隔),我没问题,但它有空格。当我使用以下方法阅读时:
df = pd.read_csv(file,index_col=None, header=None)

显然,带有 NaN 的列被读取为字符串,因为空格。如果空间具有相同的尺寸,我的问题很容易。我可以使用:
df = pd.read_csv(file,index_col=None, header=None, na_values = "   NaN")

问题解决了,但有不同空格的列。其中一些在 NaN 之前有 4 个空格,其他有 6 个,依此类推。

所以,我的问题是:是否有用于指定 na_values 的正则表达式类似 na_values = "\s+ NaN" ?

最佳答案

尝试这个:

df = pd.read_csv(engine='python', index_col=None, sep=',\s*', header=None)

解析引擎设置为 python以避免在使用正则表达式作为分隔符时收到警告。

关于python - 使用 pandas.read_csv 的 na_values 正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40493759/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com