gpt4 book ai didi

java - 使用 java 的 apache spark 中的决策树实现问题

转载 作者:搜寻专家 更新时间:2023-10-31 08:03:33 26 4
gpt4 key购买 nike

我正在尝试使用 java 和 apache spark 1.0.0 版本为决策树分类器实现简单的演示。我基于 http://spark.apache.org/docs/1.0.0/mllib-decision-tree.html .到目前为止,我已经编写了下面列出的代码。

根据以下代码我得到错误:

org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy();

类型不匹配:无法从熵转换为杂质。这对我来说很奇怪,而类 Entropy 实现了 Impurity 接口(interface):

https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/mllib/tree/impurity/Entropy.html

我正在寻找问题的答案,为什么我不能完成这项作业?

package decisionTree;

import java.util.regex.Pattern;

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.tree.DecisionTree;
import org.apache.spark.mllib.tree.configuration.Algo;
import org.apache.spark.mllib.tree.configuration.Strategy;
import org.apache.spark.mllib.tree.impurity.Gini;
import org.apache.spark.mllib.tree.impurity.Impurity;

import scala.Enumeration.Value;

public final class DecisionTreeDemo {

static class ParsePoint implements Function<String, LabeledPoint> {
private static final Pattern COMMA = Pattern.compile(",");
private static final Pattern SPACE = Pattern.compile(" ");

@Override
public LabeledPoint call(String line) {
String[] parts = COMMA.split(line);
double y = Double.parseDouble(parts[0]);
String[] tok = SPACE.split(parts[1]);
double[] x = new double[tok.length];
for (int i = 0; i < tok.length; ++i) {
x[i] = Double.parseDouble(tok[i]);
}
return new LabeledPoint(y, Vectors.dense(x));
}
}

public static void main(String[] args) throws Exception {

if (args.length < 1) {
System.err.println("Usage:DecisionTreeDemo <file>");
System.exit(1);
}

JavaSparkContext ctx = new JavaSparkContext("local[4]", "Log Analizer",
System.getenv("SPARK_HOME"),
JavaSparkContext.jarOfClass(DecisionTreeDemo.class));

JavaRDD<String> lines = ctx.textFile(args[0]);
JavaRDD<LabeledPoint> points = lines.map(new ParsePoint()).cache();

int iterations = 100;

int maxBins = 2;
int maxMemory = 512;
int maxDepth = 1;

org.apache.spark.mllib.tree.impurity.Impurity impurity = new org.apache.spark.mllib.tree.impurity.Entropy();

Strategy strategy = new Strategy(Algo.Classification(), impurity, maxDepth,
maxBins, null, null, maxMemory);

ctx.stop();
}
}

@samthebest 如果我删除杂质变量并更改为以下形式:

Strategy strategy = new Strategy(Algo.Classification(), new org.apache.spark.mllib.tree.impurity.Entropy(), maxDepth, maxBins, null, null, maxMemory);

错误更改为:构造函数 Entropy() 未定义。

[编辑]我发现我认为正确调用方法(https://issues.apache.org/jira/browse/SPARK-2197):

Strategy strategy = new Strategy(Algo.Classification(), new Impurity() {
@Override
public double calculate(double arg0, double arg1, double arg2)
{ return Gini.calculate(arg0, arg1, arg2); }

@Override
public double calculate(double arg0, double arg1)
{ return Gini.calculate(arg0, arg1); }

}, 5, 100, QuantileStrategy.Sort(), null, 256);

不幸的是我遇到了错误:(

最佳答案

Bug 2197 的 Java 解决方案现在可用,通过 this pull request :

Other improvements to Decision Trees for easy-of-use with Java: * impurity classes: Added instance() methods to help with Java interface. * Strategy: Added Java-friendly constructor --> Note: I removed quantileCalculationStrategy from the Java-friendly constructor since (a) it is a special class and (b) there is only 1 option currently. I suspect we will redo the API before the other options are included.

您可以看到一个完整的示例,即使用 Gini 杂质的 intance() 方法解决您的问题 here

Strategy strategy = new Strategy(Algo.Classification(), Gini.instance(), maxDepth, numClasses,maxBins, categoricalFeaturesInfo);
DecisionTreeModel model = DecisionTree$.MODULE$.train(rdd.rdd(), strategy);

关于java - 使用 java 的 apache spark 中的决策树实现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24471561/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com