gpt4 book ai didi

python - Pandas csv 阅读器创建 NaN 索引

转载 作者:太空宇宙 更新时间:2023-11-03 11:40:39 24 4
gpt4 key购买 nike

我有一个 CSV 文件,内容如下:

A: 12, B: 14
A: 1, B: 4
A: 2, B: 1
A: 21, B: 41

我可以用正则表达式分隔列:

import pandas as pd

df = pd.read_csv("test.csv", sep = ":\s*|,\s*", names = ["dummy1", "A", "dummy2", "B"], engine = "python")
print(df)

输出

  dummy1   A dummy2   B
0 A 12 B 14
1 A 1 B 4
2 A 2 B 1
3 A 21 B 41

为了防止创建无用的列,我尝试了以下策略:

import pandas as pd

df1 = pd.read_csv("test.csv", sep = "A:\s*|,\s*B:\s*", names = ["A", "B"], engine = "python")
print(df1)

但现在索引只包含 NaN 值:

      A   B
NaN 12 14
NaN 1 4
NaN 2 1
NaN 21 41

为什么会发生这种情况,如何预防?

最佳答案

pandas.read_csv函数接受一个 index_col 参数,该参数指示 DataFrame 的行标签(索引)。您需要使用 int 或索引序列设置 this 参数,因为默认情况下它是 None。

index_col : int or sequence or False, default None

Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)

如果这仍然不起作用,您可以放弃使用 delimiter 并只对两列使用转换器函数将数字与字母字符分开:

func = lambda x: x.split(':')[-1]
df1 = pd.read_csv("test.csv",
names = ["A", "B"],
engine = "python",
converters={'A': func,
'B': func})

输出:

     A    B
0 12 14
1 1 4
2 2 1
3 21 41

关于python - Pandas csv 阅读器创建 NaN 索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50132352/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com