gpt4 book ai didi

java - 将 JavaRDD 字符串转换为 JavaRDD vector

转载 作者:行者123 更新时间:2023-11-30 07:37:36 24 4
gpt4 key购买 nike

我正在尝试将 csv 文件作为 JavaRDD 字符串加载,然后想要获取 JavaRDD Vector 中的数据

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.feature.HashingTF;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;

import breeze.collection.mutable.SparseArray;
import scala.collection.immutable.Seq;




public class Trial {
public void start() throws InstantiationException, IllegalAccessException,
ClassNotFoundException {

run();
}


private void run(){
SparkConf conf = new SparkConf().setAppName("csvparser");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> data = jsc.textFile("C:/Users/kalraa2/Documents/trial.csv");
JavaRDD<Vector> datamain = data.flatMap(null);
MultivariateStatisticalSummary mat = Statistics.colStats(datamain.rdd());

System.out.println(mat.mean());


}

private List<Vector> Seq(Vector dv) {
// TODO Auto-generated method stub
return null;
}


public static void main(String[] args) throws Exception {

Trial trial = new Trial();
trial.start();
}
}

该程序正在运行,没有任何错误,但当我尝试在 Spark-machine 上运行它时,我无法得到任何信息。谁能告诉我字符串 RDD 到 Vector RDD 的转换是否正确。

我的 csv 文件仅包含一列 float

最佳答案

null在这个flatMap调用可能有问题:

JavaRDD<Vector> datamain = data.flatMap(null);

关于java - 将 JavaRDD 字符串转换为 JavaRDD vector ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35175007/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com