gpt4 book ai didi

python - Pandas 从列表中设置数据类型

转载 作者:太空宇宙 更新时间:2023-11-04 11:15:19 27 4
gpt4 key购买 nike

我正在读取一个大文件并节省内存。我需要为数据框中的每一列指定数据类型。我想从我已经为 dtype 创建的列表中获取。

import pandas as pd

headers=['Record Identifier','Respondent_ID','Agency Code','Loan Type','Property Type','Loan Purpose','Owner Occupancy',
'Loan Amount','Preapprovals','Type of Action Taken','Metropolitan Statistical Area/Metropolitan Division','State Code',
'County Code','Census Tract','Applicant Ethnicity','Co-applicant Ethnicity','Applicant Race: 1','Applicant Race: 2',
'Applicant Race: 3','Applicant Race: 4','Applicant Race: 5','Co-applicant Race: 1','Co-applicant Race: 2',
'Co-applicant Race: 3','Co-applicant Race: 4','Co-applicant Race: 5','Applicant Sex','Co-applicant Sex',
'Applicant Income','Type of Purchaser','Denial Reason: 1','Denial Reason: 2','Denial Reason: 3','Rate Spread',
'HOEPA Status','Lien Status','Population','Minority Population %','FFIEC Median Family Income',
'Tract to MSA/MD Median Family Income %','Number of Owner Occupied Units','Number of 1- to 4-Family units']


dtypes=['int64','object','int64','int64','int64','int64','int64','int64','int64','int64','object','object','object','object',
'int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64',
'object','int64','int64','int64','int64','object','object','object','object','float64','int64','float64','int64',
'int64']


df = pd.read_csv('2017_lar.txt', sep="|", header=None, names=headers, dtype=dtypes, nrows=100)

print(df)

错误:TypeError: 数据类型不理解

最佳答案

您使用的参数不正确。您只能指定一个类型名称,或将列标题与类型匹配的 dict

文档中清楚地涵盖了这一点:

dtype : Type name or dict of column -> type, optional

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.


由于您传递的是一个列表,它假定整个列表 是数据类型,这是不被理解的。


这是一个正确的用法。

import io
import pandas as pd

i = io.StringIO("""
1|2|3
4|5|6
7|8|9
""")

headers = ['a', 'b', 'c']
dtypes = ['int64', 'object', 'int']

df = pd.read_csv(i, header=None, names=headers, sep='|', dtype=dict(zip(headers, dtypes)))

>>> df
a b c
0 1 2 3
1 4 5 6
2 7 8 9

>>> df.dtypes
a int64
b object
c int32
dtype: object

关于python - Pandas 从列表中设置数据类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57149661/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com