gpt4 book ai didi

scikit-learn - LinearSVC 变换的阈值如何工作?

转载 作者:行者123 更新时间:2023-12-01 05:08:28 25 4
gpt4 key购买 nike

我使用 LinearSVC 作为决策树分类器的预处理步骤。我运行 LinearSVC
然后我做变换(X)。我注意到特征数量从大约 35 个减少到 9 个。我想知道实际选择了哪些特征。

我知道默认情况下,transform(X) 与 threshold='mean' 一起使用。有人可以告诉我它如何确定是否保留该功能的示例吗?

这是我的 coef_。
array([[ -2.45022173e-01, -8.61032928e-02, -2.39513401e-03,
-2.07443644e-02, 2.49547244e-03, -3.14133367e-02,
7.09627000e-03, 3.94563929e-03, 6.78145800e-02,
1.59497586e-01, -1.24063075e-01, -4.79223418e-02,
-3.70412138e-02, 4.39187481e-02, 1.30004636e-02,
-2.31911643e-03, -1.63937709e-03, -2.18402321e-03,
-2.65601394e-03, 1.48259224e-02, -6.15157373e-02,
-3.65242492e-04, 8.10479000e-02, -1.58338535e-01,
5.06225924e-03, 1.16183358e-03, 6.44170055e-02,
-2.56651350e-03, 1.62029008e-01, -1.69785296e+00,
-1.91045465e+00, -1.64206237e+00, -1.80735175e+00,
-1.39504546e+00, -1.66709852e+00],
[ 4.14083584e-01, 2.03703885e-01, 4.82783739e-03,
7.90756359e-02, -1.45063508e-03, 1.05486236e-01,
-3.01145160e-01, -7.81145855e-03, -3.39445309e-01,
-5.66603101e-01, 2.41489561e-01, 3.11615301e-01,
-3.59607168e-01, -4.04092005e-01, -3.18262477e-03,
8.14224001e-04, 8.64216590e-04, 6.59107091e-03,
5.48336293e-03, -1.76329713e-02, 2.33854833e-01,
-1.00455178e-01, -5.00175471e-02, 4.81448974e-02,
3.13891484e-01, 3.54014313e-03, 3.32840843e-01,
6.85018177e-05, -6.75410702e-01, -1.03258781e-01,
2.59870671e-01, -3.03956500e-01, -1.58732859e-01,
-3.89772985e-01, -2.55624888e-01],
[ 1.06132321e-01, 1.23617156e-01, 1.40819416e-03,
1.06118853e-01, 5.11221833e-04, -1.68780545e-01,
9.27425326e-02, 3.52220207e-03, 2.12134293e-01,
3.54667378e-01, 1.22840976e-01, -4.21232679e-01,
3.55037449e-01, -2.06715803e-01, 6.18856581e-02,
-4.63662372e-03, -5.04710160e-04, -4.65594740e-04,
1.01529235e-02, 1.15598254e-03, 4.49951214e-02,
2.20830485e-01, -1.01269555e-01, 3.03514605e-01,
-1.27056578e-01, -2.17123757e-02, -2.51044202e-01,
7.19562937e-03, -6.74304600e-01, 2.47410746e-01,
-7.76792375e-02, 2.26260621e-01, 3.83972532e-01,
4.35143804e-01, 3.50074110e-02],
[ 6.33038442e-02, 3.71367520e-01, -1.21238483e-02,
-5.92230089e-02, -2.69617795e-03, 2.44885573e-01,
-1.12043386e-01, -1.05526224e-01, -9.88583026e-02,
-6.09121814e-01, -5.16313417e-01, 2.83500385e-01,
2.04390765e-01, 9.13454922e-01, 2.12522482e-02,
4.67960378e-03, 3.78514732e-03, -1.89184862e-03,
-2.35710741e-02, 2.77863999e-02, 5.93172013e-01,
-3.98200956e-01, 2.04199614e-01, -6.20399607e-02,
1.19732985e-01, 1.16674647e-01, -1.27517918e-03,
-4.23253804e-03, -1.82480535e+00, 9.29959444e-01,
1.21162165e+00, 1.09899835e+00, 7.42987354e-01,
9.61956169e-01, 8.72089435e-01],
[ 2.98336593e-01, 1.36166556e-01, 8.55303000e-04,
1.13137553e-01, -4.11417197e-03, 2.59650136e-01,
7.87008264e-02, 7.22415689e-03, -3.64334467e-02,
-2.57473176e-02, -1.01132206e-01, -4.52864069e-02,
8.62911851e-03, -1.01396648e-01, -1.71810251e-01,
2.87556170e-02, -5.75335168e-03, -1.31809609e-03,
2.27847222e-02, -1.64198532e-02, -8.11859436e-03,
-2.60700154e-02, 1.74207263e-01, 1.10324971e-01,
6.65055594e-02, 4.11639440e-03, -9.68050856e-02,
4.32464307e-02, 1.26432150e+00, 2.80210335e-02,
1.30525549e-01, 4.34196521e-01, -2.46460632e-01,
3.85467301e-01, -2.58179093e-02]])
我已经阅读了文档。我不确定的是这个“平均值”是如何计算的。是特征手段的均值吗?如果我有 5 个类和 35 个特征,则每个类的该特征的系数将不同。我应该先找到特征的均值,然后再找到这些特征的均值吗?

最佳答案

来自 documentation :

用于特征选择的阈值。 保留重要性大于或等于的特征,而丢弃其他特征。 如果是“中值”(对应“平均值”),则阈值是特征重要性的中值(对应平均值)。也可以使用缩放因子(例如,“1.25*mean”)。如果无且可用,则使用对象属性阈值。否则,默认使用“mean”。

这里的重要性由系数给出。

关于scikit-learn - LinearSVC 变换的阈值如何工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26764249/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com