gpt4 book ai didi

python - pandas 无法推断带双引号的 str 类型

转载 作者:太空宇宙 更新时间:2023-11-03 17:08:12 25 4
gpt4 key购买 nike

我想通过 pandas.read_csv() 从 csv 文件导入数据。我的数据类型是带有“”的字符串(但这些字符串是指示类别的数字)。我发现 pandas 无法将字符串推断为 "object" 类型,它将它们推断为 int64 。请参阅下面的示例:

a.csv:

uid, f_1, f_2
1, "1", 1.1
2, "2", 2.3
3, "0", 4.8

pandas.read_csv('a.csv').dtypes 给出以下输出:

uid:int64
f_1:int64
f_2:float64

f_1 的类型被推断为 'int64' 而不是 'object'

但是,如果我将 a.csv 中的所有 " 替换为 ',则 f_1 可以正确推断为 'object'。如何在不修改'a.csv'的情况下防止错误的推断?另一个问题是为什么pandas将字符串推断为'object'类型而不是 'str' 类型?

最佳答案

我认为您需要在read_csv中添加参数skipinitialspace :

skipinitialspace : boolean, default False, Skip spaces after delimiter

测试:

import pandas as pd
import numpy as np
import io


temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""

print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80

#doesn't work dtype
print pd.read_csv(io.StringIO(temp), dtype= {'f_1': np.int64}).dtypes
uid int64
f_1 object
f_2 float64
dtype: object

print pd.read_csv(io.StringIO(temp), skipinitialspace=True).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object

如果您想从 f_1 列中删除第一个和最后一个字符 ",请使用转换器:

import pandas as pd
import io


temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""

print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80

#remove "
def converter(x):
return x.strip('"')

#define each column
converters={'f_1': converter}

df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, converters = converters)
print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object

如果您需要将整数f_1转换为字符串,请使用dtype:

import pandas as pd
import io


temp=u"""uid, f_1, f_2
1, 1, 1.19
2, 2, 2.3
3, 0, 4.8"""

print pd.read_csv(io.StringIO(temp)).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object

df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, dtype = {'f_1' : str })

print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object

注意:不要忘记将 io.StringIO(temp) 更改为 a.csv

解释strobjecthere .

关于python - pandas 无法推断带双引号的 str 类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34378791/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com