gpt4 book ai didi

python:如何从 feature_importances 中获取真实的特征名称

转载 作者:太空宇宙 更新时间:2023-11-04 07:08:03 25 4
gpt4 key购买 nike

我正在使用 Python 的 sklearn 随机森林 (ensemble.RandomForestClassifier) 进行分类,并使用 feature_importances_ 为分类器寻找重要特征.现在我的代码是:

for trip in database:
venue_feature_start.append(Counter(trip['POI']))
# Counter(trip['POI']) is like Counter({'school':1, 'hospital':1, 'bus station':2}),actually key is the feature

feat_loc_vectorizer = DictVectorizer()
feat_loc_vectorizer.fit(venue_feature_start)
feat_loc_orig_mat = feat_loc_vectorizer.transform(venue_feature_start)

orig_tfidf = TfidfTransformer()
orig_ven_feat = orig_tfidf.fit_transform(feat_loc_orig_mat.tocsr())

# so DictVectorizer() and TfidfTransformer() help me to phrase the features and for each instance, the feature dimension is 580, which means that there are 580 venue types

data = orig_ven_feat.tocsr()

le = LabelEncoder()
labels = le.fit_transform(labels_raw)
if "Unlabelled" in labels_raw:
unlabelled_int = int(le.transform(["Unlabelled"]))
else:
unlabelled_int = -1

valid_rows_idx = np.where(labels!=unlabelled_int)[0]
labels = labels[valid_rows_idx]
user_ids = np.asarray(user_ids_raw)
# user_ids is for cross validation, labels is for classification

clf = ensemble.RandomForestClassifier(n_estimators = 50)
cv_indices = LeavePUsersOut(user_ids[valid_rows_idx], n_folds = 10)
data = data[valid_rows_idx,:].toarray()
for train_ind, test_ind in cv_indices:
train_data = data[train_ind,:]
test_data = data[test_ind,:]
labels_train = labels[train_ind]
labels_test = labels[test_ind]

print ("Training classifier...")
clf.fit(train_data,labels_train)
importances = clf.feature_importances_

现在的问题是,当我使用 feature_importances 时,我得到一个维度为 580(与特征维度相同)的数组,我想知道前 20 个重要特征(前 20 个重要场所)

我认为至少我应该知道的是 重要性 中最大的 20 个数字的索引,但我不知道:

  1. 如何根据重要性

    获取前 20 名的指数
  2. 因为我使用了 Dictvectorizer 和 TfidfTransformer,所以我不知道如何将索引与真实的地点名称(“学校”、“家”……)相匹配

有什么想法可以帮助我吗?非常感谢!

最佳答案

要获得每个特征名称的重要性,只需一起遍历列名称和 feature_importances(它们相互映射):

for feat, importance in zip(df.columns, clf.feature_importances_):
print 'feature: {f}, importance: {i}'.format(f=feat, i=importance)

关于python:如何从 feature_importances 中获取真实的特征名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30355159/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com