gpt4 book ai didi

python - 精确率、召回率、F1 分数与 sklearn 相同

转载 作者:太空宇宙 更新时间:2023-11-04 00:41:47 24 4
gpt4 key购买 nike

我正在尝试比较 k 最近邻算法中不同的距离计算方法和不同的投票系统。目前我的问题是,无论我做什么,来自 scikit-learn 的 precision_recall_fscore_support 方法都会在精度、召回率和 fscore 方面产生完全相同的结果。这是为什么?我已经在不同的数据集(虹膜、玻璃和 Wine )上进行了尝试。我究竟做错了什么?到目前为止的代码:

#!/usr/bin/env python3
from collections import Counter
from data_loader import DataLoader
from sklearn.metrics import precision_recall_fscore_support as pr
import random
import math
import ipdb

def euclidean_distance(x, y):
return math.sqrt(sum([math.pow((a - b), 2) for a, b in zip(x, y)]))

def manhattan_distance(x, y):
return sum(abs([(a - b) for a, b in zip(x, y)]))

def get_neighbours(training_set, test_instance, k):
names = [instance[4] for instance in training_set]
training_set = [instance[0:4] for instance in training_set]
distances = [euclidean_distance(test_instance, training_set_instance) for training_set_instance in training_set]
distances = list(zip(distances, names))
print(list(filter(lambda x: x[0] == 0.0, distances)))
sorted(distances, key=lambda x: x[0])
return distances[:k]

def plurality_voting(nearest_neighbours):
classes = [nearest_neighbour[1] for nearest_neighbour in nearest_neighbours]
count = Counter(classes)
return count.most_common()[0][0]

def weighted_distance_voting(nearest_neighbours):
distances = [(1/nearest_neighbour[0], nearest_neighbour[1]) for nearest_neighbour in nearest_neighbours]
index = distances.index(min(distances))
return nearest_neighbours[index][1]

def weighted_distance_squared_voting(nearest_neighbours):
distances = list(map(lambda x: 1 / x[0]*x[0], nearest_neighbours))
index = distances.index(min(distances))
return nearest_neighbours[index][1]

def main():
data = DataLoader.load_arff("datasets/iris.arff")
dataset = data["data"]
# random.seed(42)
random.shuffle(dataset)
train = dataset[:100]
test = dataset[100:150]
classes = [instance[4] for instance in test]
predictions = []
for test_instance in test:
prediction = weighted_distance_voting(get_neighbours(train, test_instance[0:4], 15))
predictions.append(prediction)
print(pr(classes, predictions, average="micro"))

if __name__ == "__main__":
main()

最佳答案

问题是您使用的是“微观”平均值。

如前所述here :

As is written in the documentation: "Note that for “micro”-averaging in a multiclass setting will produce equal precision, recall and [image: F], while “weighted” averaging may produce an F-score that is not between precision and recall." http://scikit-learn.org/stable/modules/model_evaluation.html

But if you drop a majority label, using the labels parameter, then micro-averaging differs from accuracy, and precision differs from recall.

关于python - 精确率、召回率、F1 分数与 sklearn 相同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41624878/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com