gpt4 book ai didi

python - 将字符串数组(类别)从 Pandas 数据帧转换为整数数组

转载 作者:太空狗 更新时间:2023-10-29 17:52:53 26 4
gpt4 key购买 nike

我正在尝试做与 previous question 非常相似的事情但我得到一个错误。我有一个包含特征的 Pandas 数据框,标签我需要做一些转换以将特征和标签变量发送到机器学习对象中:

import pandas
import milk
from scikits.statsmodels.tools import categorical

然后我有:

trainedData=bigdata[bigdata['meta']<15]
untrained=bigdata[bigdata['meta']>=15]
#print trainedData
#extract two columns from trainedData
#convert to numpy array
features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area'])
un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area'])
print 'features'
print features[:5]
##label is a string:single, touching,nuclei,dust
print 'labels'

labels=trainedData.ix[:,['type']].as_matrix(['type'])
print labels[:5]
#convert single to 0, touching to 1, nuclei to 2, dusts to 3
#
tmp=categorical(labels,drop=True)
targets=categorical(labels,drop=True).argmax(1)
print targets

输出控制台首先产生:

features
[[ 0.38846334 0.97681855]
[ 3.8318634 0.5724734 ]
[ 0.67710876 1.01816444]
[ 1.12024943 0.91508699]
[ 7.51749674 1.00156707]]
labels
[[single]
[touching]
[single]
[single]
[nuclei]]

然后我遇到了以下错误:

Traceback (most recent call last):
File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module>
tmp=categorical(labels,drop=True)
File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical
tmp_dummy = (tmp_arr[:,None]==data).astype(float)
AttributeError: 'bool' object has no attribute 'astype'

是否可以将数据框中的类别变量“type”转换为 int 类型? “type”可以取值“single”、“touching”、“nuclei”、“dusts”,我需要转换为 0、1、2、3 等 int 值。

最佳答案

以前的答案已经过时,所以这里有一个将字符串映射到数字的解决方案,适用于 0.18.1 版的 Pandas。

对于系列:

In [1]: import pandas as pd
In [2]: s = pd.Series(['single', 'touching', 'nuclei', 'dusts',
'touching', 'single', 'nuclei'])
In [3]: s_enc = pd.factorize(s)
In [4]: s_enc[0]
Out[4]: array([0, 1, 2, 3, 1, 0, 2])
In [5]: s_enc[1]
Out[5]: Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object')

对于数据框:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'labels': ['single', 'touching', 'nuclei',
'dusts', 'touching', 'single', 'nuclei']})
In [3]: catenc = pd.factorize(df['labels'])
In [4]: catenc
Out[4]: (array([0, 1, 2, 3, 1, 0, 2]),
Index([u'single', u'touching', u'nuclei', u'dusts'],
dtype='object'))
In [5]: df['labels_enc'] = catenc[0]
In [6]: df
Out[4]:
labels labels_enc
0 single 0
1 touching 1
2 nuclei 2
3 dusts 3
4 touching 1
5 single 0
6 nuclei 2

关于python - 将字符串数组(类别)从 Pandas 数据帧转换为整数数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7813132/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com