gpt4 book ai didi

python - 如何对多个变量和多个模型执行(修改的)t 检验

转载 作者:太空宇宙 更新时间:2023-11-04 04:15:35 27 4
gpt4 key购买 nike

我使用 WEKA 创建并分析了大约 16 个机器学习模型。现在,我有一个 CSV 文件,其中显示了模型的指标(例如 percent_correct、F-measure、recall、precision 等)。我正在尝试对这些模型进行(修改后的)学生 t 检验。我能够进行一个(根据此链接),我只比较两个模型共有的一个变量。我想一次使用多个变量和多个模型执行一个(或多个)t 检验。

如前所述,我只能在两个模型(假设决策表和神经网络)中使用一个变量(假设 F-measure)执行测试。

这是它的代码。我正在执行 Kolmogorov-Smirnov 测试(修改后的 t):

from matplotlib import pyplot
from pandas import read_csv, DataFrame
from scipy.stats import ks_2samp

results = DataFrame()
results['A'] = read_csv('LMT (f-measure).csv', header=None).values[:, 0]
results['B'] = read_csv('LWL (f-measure).csv', header=None).values[:, 0]
print(results.describe())
results.boxplot()
pyplot.show()
results.hist()
pyplot.show()

value, pvalue = ks_2samp(results['A'], results['B'])
alpha = 0.05
print(value, pvalue)
if pvalue > alpha:
print('Samples are likely drawn from the same distributions (fail to reject H0)')
else:
print('Samples are likely drawn from different distributions (reject H0)')

有什么想法吗?

最佳答案

这是对我的问题的简单解决方案。它只处理两个模型和两个变量,但您可以轻松获得包含分类器名称和要分析的指标的列表。出于我的目的,我只是更改了 COI 的值, ROI_1 , 和 ROI_2分别。

注意:此解决方案也是可推广的。如何?只需更改 COI 的值即可, ROI_1 , 和 ROI_2 df = pandas.read_csv("FILENAME.csv, ...)中加载任何选定的数据集.如果你想要另一个可视化,只需更改 pyplot接近尾声的设置。

key 正在分配一个新的 DataFrame到原文DataFrame并实现 .loc["SOMESTRING"]方法。它删除数据中的所有行,除了指定为参数的行。

但是请记住,包括 index_col=0 当您读取文件时或 使用其他方法设置 DataFrame 的索引 .如果不这样做,您的 row值将只是索引,来自 0 to MAX_INDEX .

# Written: April 4, 2019

import pandas # for visualizations
from matplotlib import pyplot # for visualizations
from scipy.stats import ks_2samp # for 2-sample Kolmogorov-Smirnov test
import os # for deleting CSV files

# Functions which isolates DataFrame
def removeColumns(DataFrame, typeArray, stringOfInterest):
for i in range(0, len(typeArray)):
if typeArray[i].find(stringOfInterest) != -1:
continue
else:
DataFrame.drop(typeArray[i], axis = 1, inplace = True)

# Get the whole DataFrame
df = pandas.read_csv("ExperimentResultsCondensed.csv", index_col=0)
dfCopy = df

# Specified metrics and models for comparison
COI = "Area_under_PRC"
ROI_1 = "weka.classifiers.meta.AdaBoostM1[DecisionTable]"
ROI_2 = "weka.classifiers.meta.AdaBoostM1[DecisionStump]"

# Lists of header and row in dataFrame
# `rows` may act strangely
headers = list(df.dtypes.index)
rows = list(df.index)

# remove irrelevant rows
df1 = dfCopy.loc[ROI_1]
df2 = dfCopy.loc[ROI_2]

# remove irrelevant columns
removeColumns(df1, headers, COI)
removeColumns(df2, headers, COI)

# Make CSV files
df1.to_csv(str(ROI_1 + "-" + COI + ".csv"), index=False)
df2.to_csv(str(ROI_2 + "-" + COI) + ".csv", index=False)

results = pandas.DataFrame()
# Read CSV files
# The CSV files can be of any netric/measure, F-measure is used as an example
results[ROI_1] = pandas.read_csv(str(ROI_1 + "-" + COI + ".csv"), header=None).values[:, 0]
results[ROI_2] = pandas.read_csv(str(ROI_2 + "-" + COI + ".csv"), header=None).values[:, 0]

# Kolmogorov-Smirnov test since we have Non-Gaussian, independent, distinctive variance datasets
# Test configurations
value, pvalue = ks_2samp(results[ROI_1], results[ROI_2])
# Corresponding confidence level: 95%
alpha = 0.05

# Output the results
print('\n')
print('\033[1m' + '>>>TEST STATISTIC: ')
print(value)
print(">>>P-VALUE: ")
print(pvalue)
if pvalue > alpha:
print('\t>>Samples are likely drawn from the same distributions (fail to reject H0 - NOT SIGNIFICANT)')
else:
print('\t>>Samples are likely drawn from different distributions (reject H0 - SIGNIFICANT)')

# Plot files
df1.plot.density()
pyplot.xlabel(str(COI + " Values"))
pyplot.ylabel(str("Density"))
pyplot.title(str(COI + " Density Distribution of " + ROI_1))
pyplot.show()

df2.plot.density()
pyplot.xlabel(str(COI + " Values"))
pyplot.ylabel(str("Density"))
pyplot.title(str(COI + " Density Distribution of " + ROI_2))
pyplot.show()

# Delete Files
os.remove(str(ROI_1 + "-" + COI + ".csv"))
os.remove(str(ROI_2 + "-" + COI + ".csv"))

关于python - 如何对多个变量和多个模型执行(修改的)t 检验,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55503358/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com