gpt4 book ai didi

java - Weka:属性选择期间的监督离散化问题和错误 "Not enough training instances"

转载 作者:行者123 更新时间:2023-12-02 07:37:57 26 4
gpt4 key购买 nike

在过去的一个月左右的时间里,我一直在自学 Weka API(我是一名学生)。我正在做的是编写一个程序,它将过滤一组特定的数据并最终为其构建一个贝叶斯网,一周前我已经完成了我的离散化类和属性选择类。就在几天前,我意识到我需要将我的离散化函数更改为受监督,并最终使用默认的 Fayyad&Irani 方法,在我这样做之后,我开始在我的属性选择类中收到此错误:

Exception in thread "main" weka.core.WekaException: 
weka.attributeSelection.CfsSubsetEval: Not enough training instances with class labels (required: 1, provided: 0)!
at weka.core.Capabilities.test(Capabilities.java:1138)
at weka.core.Capabilities.test(Capabilities.java:1023)
at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
at weka.attributeSelection.CfsSubsetEval.buildEvaluator(CfsSubsetEval.java:331)
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:597)
at weka.filters.supervised.attribute.AttributeSelection.batchFinished(AttributeSelection.java:456)
at weka.filters.Filter.useFilter(Filter.java:663)
at AttributeSelectionFilter.selectionFilter(AttributeSelectionFilter.java:29)
at Runner.main(Runner.java:70)

我在更改之前的属性选择工作得很好,所以我认为我可能在我的离散类中做错了什么。我这个问题的另一部分与此相关,因为我还注意到我的离散化类似乎并没有真正离散化数据;它只是将所有数字数据放入一个范围内,而不是像 Fayyad 和Irani 那样策略性地将其分箱。

这是我的离散类:

import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.attribute.Discretize;
import weka.filters.unsupervised.attribute.NumericToNominal;

public class DiscretizeFilter
{
private Instances data;
private boolean sensitiveOption;
private Filter filter = new Discretize();

public DiscretizeFilter(Instances data, boolean sensitiveOption)
{
this.data = data;
this.sensitiveOption = sensitiveOption;
}

public Instances discreteFilter() throws Exception
{
NumericToNominal nm = new NumericToNominal();
nm.setInputFormat(data);
Filter.useFilter(data, nm);
Instances nominalData = nm.getOutputFormat();

if(sensitiveOption)//if the user wants extra sensitivity
{
String options[] = new String[1];
options[0] = options[0];
options[2] = "-E";
((Discretize) filter).setOptions(options);
}
filter.setInputFormat(nominalData);
Filter.useFilter(nominalData,filter);
return filter.getOutputFormat();
}
}

这是我的属性选择类:

import weka.attributeSelection.BestFirst;
import weka.attributeSelection.CfsSubsetEval;
import weka.core.Instances;
import weka.filters.supervised.attribute.AttributeSelection;

public class AttributeSelectionFilter
{
public Instances selectionFilter(Instances data) throws Exception
{
AttributeSelection filter = new AttributeSelection();

for(int i = 0; i < data.numInstances(); i++)
{
filter.input(data.instance(i));
}
CfsSubsetEval eval = new CfsSubsetEval();
BestFirst search = new BestFirst();
filter.setSearch(search);
filter.setEvaluator(eval);

filter.setInputFormat(data);
AttributeSelection.useFilter(data, filter);

return filter.getOutputFormat();
}

public int attributeCounter(Instances data)
{
return data.numAttributes();
}
}

任何帮助将不胜感激!!!

最佳答案

Weka 在内部将属性值存储为 double 值。似乎抛出了异常,因为数据集中的每个实例 (data) 都“缺少类”,即出于某种原因被赋予了内部类属性值 NaN(“不是数字”) 。我建议仔细检查data's类属性是否已正确创建/设置。

关于java - Weka:属性选择期间的监督离散化问题和错误 "Not enough training instances",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13001480/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com