gpt4 book ai didi

Python 将字符串转换为分类 - numpy

转载 作者:太空宇宙 更新时间:2023-11-04 05:22:00 24 4
gpt4 key购买 nike

我正在拼命尝试更改以下数据集中的字符串变量 daycar2

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23653 entries, 0 to 23652
Data columns (total 7 columns):
day 23653 non-null object
clustDep 23653 non-null int64
clustArr 23653 non-null int64
car2 23653 non-null object
clustRoute 23653 non-null int64
scheduled_seg 23653 non-null int64
delayed 23653 non-null int64
dtypes: int64(5), object(2)
memory usage: 1.4+ MB
None

我已经尝试了 SO 上的所有内容,如您在下面的代码示例中所见。我正在运行 Python 2.7,numpy 1.11.1。我尝试了 scikits.tools.categorical 但没有成功,它不会事件加载命名空间。这是我的代码:

import numpy as np
#from scikits.statsmodels import sm

trainId = np.random.choice(range(df.shape[0]), size=int(df.shape[0]*0.8), replace=False)
train = df[['day', 'clustDep', 'clustArr', 'car2', 'clustRoute', 'scheduled_seg', 'delayed']]

#for col in ['day', 'car2', 'scheduled_seg']:
# train[col] = train.loc[:, col].astype('category')

train['day'] = train['day'].astype('category')
#train['day'] = sm.tools.categorical(train, cols='day', drop=True)
#train['car2C'] = train['car2'].astype('category')
#train['scheduled_segC'] = train['scheduled_seg'].astype('category')


train = df.loc[trainId, train.columns]
testId = np.in1d(df.index.values, trainId, invert=True)
test = df.loc[testId, train.columns]


#from sklearn import tree
#clf = tree.DecisionTreeClassifier()
#clf = clf.fit(train.drop(['delayed'], axis=1), train['delayed'])

这会产生以下错误:

/Users/air/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

如有任何帮助,我们将不胜感激。非常感谢!

---更新---示例数据:

             day  clustDep  clustArr car2  clustRoute  scheduled_seg  delayed
0 Saturday 12 15 AA 1 5 1
1 Tuesday 12 15 AA 1 1 1
2 Tuesday 12 15 AA 1 5 1
3 Saturday 12 13 AA 4 3 1
4 Saturday 2 13 AB 6 3 1
5 Wednesday 2 13 IB 6 3 1
6 Monday 2 13 EY 6 3 0
7 Friday 2 13 EY 6 3 1
8 Saturday 11 13 AC 6 5 1
9 Friday 11 13 DL 6 5 1

最佳答案

它对我来说很好用(Pandas 0.19.0):

In [155]: train
Out[155]:
day clustDep clustArr car2 clustRoute scheduled_seg delayed
0 Saturday 12 15 AA 1 5 1
1 Tuesday 12 15 AA 1 1 1
2 Tuesday 12 15 AA 1 5 1
3 Saturday 12 13 AA 4 3 1
4 Saturday 2 13 AB 6 3 1
5 Wednesday 2 13 IB 6 3 1
6 Monday 2 13 EY 6 3 0
7 Friday 2 13 EY 6 3 1
8 Saturday 11 13 AC 6 5 1
9 Friday 11 13 DL 6 5 1

In [156]: train.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 7 columns):
day 10 non-null object
clustDep 10 non-null int64
clustArr 10 non-null int64
car2 10 non-null object
clustRoute 10 non-null int64
scheduled_seg 10 non-null int64
delayed 10 non-null int64
dtypes: int64(5), object(2)
memory usage: 640.0+ bytes

In [157]: train.day = train.day.astype('category')

In [158]: train.car2 = train.car2.astype('category')

In [159]: train.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 7 columns):
day 10 non-null category
clustDep 10 non-null int64
clustArr 10 non-null int64
car2 10 non-null category
clustRoute 10 non-null int64
scheduled_seg 10 non-null int64
delayed 10 non-null int64
dtypes: category(2), int64(5)
memory usage: 588.0 bytes

关于Python 将字符串转换为分类 - numpy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39964451/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com