gpt4 book ai didi

python - 隔离林

转载 作者:太空宇宙 更新时间:2023-11-03 10:54:02 24 4
gpt4 key购买 nike

我目前正在使用 Python 中的 IsolationForest 方法识别数据集中的离群值,但并不完全理解 sklearn 上的示例:

http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

具体来说,图表实际上向我们展示了什么?观测值已被定义为正常/离群值——所以我假设等高线图的阴影表明该观测值是否确实是离群值(例如,具有较高异常分数的观测值位于较暗的阴影区域?)。

最后,下面的代码部分是如何实际使用的(特别是 y_pred 函数)?

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

我猜它只是为了完整性而提供,以防有人想要打印输出?

在此先感谢您的帮助!

最佳答案

For each observation, it tells whether or not (+1 or -1**) it should be considered as an outlier according to the fitted model.**


使用 Iris 数据的简单示例

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)
data = load_iris()

X=data.data
y=data.target
X_outliers = rng.uniform(low=-4, high=4, size=(X.shape[0], X.shape[1]))

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=0)

clf = IsolationForest(random_state=0)
clf.fit(X_train)

y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

print(y_pred_test)
print(y_pred_outliers)

结果:

[-1 -1 -1 -1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1 -1
1 -1 -1 1 -1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 -1 1]

[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]

解释:

print(y_pred_test) 返回 1 和 -1。这意味着 X_test 的一些样本不是异常值,而一些(source)。

另一方面,print(y_pred_outliers) 只返回 -1。这意味着 X_outliers 的所有样本(虹膜数据总共 150 个)都是异常值。


使用您的代码

在您的代码之后,只需打印y_pred_outliers:

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

print(y_pred_outliers)

关于python - 隔离林,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44951597/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com