gpt4 book ai didi

c# - 获取用于 PFI 分析的 BinaryClassification FastTree FeatureNames

转载 作者:太空宇宙 更新时间:2023-11-03 14:43:22 25 4
gpt4 key购买 nike

我使用 trainingDataView 中的列子集在 ML.net 1.0.0 中构建了一个简单的 BinaryClassification FastTree 模型。现在,我想执行 PFI 分析,但似乎无法仅隔离模型中使用的列/功能与 IDataView 中的所有列。

我一直在引用 this link 的示例二进制分类的 PFI。

var trainingDataView = mlContext.Data.LoadFromTextFile<FPPCNTKData>(TrainDataPath, hasHeader: false, separatorChar: ' ');

Var pipeline = mlContext.Transforms.Concatenate("Features",
"mCalc_FPP_Legs_Range",
"mCalc_FPP_Legs_Ticks",
"mCalc_FPP_Legs_Bars",
"mCalc_FPP_Legs_TMins",
"mCalc_FPP_Diag_RangeBars",
"mCalc_FPP_Diag_RangeTMins",
"mCalc_FPP_Diag_TicksBars",
"mCalc_FPP_Diag_TicksTMins",
"mCalc_XD_XA_Mult_Ticks",
"mCalc_AB_XA_Mult_Ticks",
"mCalc_AD_XA_Mult_Ticks",
"mCalc_BC_XA_Mult_Ticks",
"mCalc_BC_AB_Mult_Ticks",
"mCalc_CD_AB_Mult_Ticks",
"mCalc_CD_BC_Mult_Ticks",
"mCalc_CD_BD_Mult_Ticks")
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "mHiProfitOneHot", featureColumnName: "Features"));

var trainedModel = pipeline.Fit(trainingDataView);

正如您在下面看到的,由于我是从原始 trainingDataView 中收集特征名称,而不是模型中使用的特征名称,因此 PFI 项目被错误标记。

//// Compute the permutation metrics using the properly normalized data.
var linearPredictor = trainedModel.LastTransformer;
var transformedData = trainedModel.Transform(trainingDataView);
var permutationMetrics = mlContext.BinaryClassification.PermutationFeatureImportance(
linearPredictor, transformedData, labelColumnName: "mHiProfitOneHot", permutationCount: 3);

// Now let's look at which features are most important to the model overall.
// Get the feature indices sorted by their impact on AUC.
var sortedIndices = permutationMetrics.Select((MetricStatistics, index) => new { index, metrics.AreaUnderRocCurve })
.OrderByDescending(feature => Math.Abs(feature.AreaUnderRocCurve))
.Select(feature => feature.index);

// Get the feature names from the training set
var featureNames =
trainingDataView.Schema.AsEnumerable()
.Select(column => column.Name) // Get the column names
.Where(name => name != "mHiProfitOneHot") // Drop the Label
.ToArray();


Console.WriteLine("Feature\tModel Weight\tChange in AUC\t95% Confidence in the Mean Change in AUC");
var auc = permutationMetrics.Select(x => x.AreaUnderRocCurve).ToArray();
foreach (int i in sortedIndices)
{
Console.WriteLine("{0}\t{1:0.00}\t{2:G4}\t{3:G4}",
featureNames[i],
linearPredictor.Model.SubModel.TrainedTreeEnsemble.TreeWeights[i],
auc[i].Mean,
1.96 * auc[i].StandardError);
}

是否可以直接从模型中提取特征名称的子集?谢谢。

最佳答案

您可以搜索您的模型(假设它是一个 TransformerChain,就像您的情况一样)寻找 ColumnConcatenatingTransformer 并获取输入列名称。

string[] columnNames = (model
.FirstOrDefault(t => t is ColumnConcatenatingTransformer) as ColumnConcatenatingTransformer)
?.Columns
?.FirstOrDefault(c => c.outputColumnName == "Features")
.inputColumnNames;
Console.WriteLine(String.Join(", ", columnNames));

关于c# - 获取用于 PFI 分析的 BinaryClassification FastTree FeatureNames,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55656712/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com