gpt4 book ai didi

java - 无法运行Java Spark Hive示例

转载 作者:行者123 更新时间:2023-12-02 19:00:30 26 4
gpt4 key购买 nike

我有以下Java Spark Hive Example,可以在官方apache / spark Github上找到。我花了很多时间来理解如何在Hortonworks Hadoop沙盒中运行示例,但没有成功。

目前,我正在执行以下操作:

  • 在我的Maven项目中导入apache/spark examples,这工作正常,并且我没有遇到任何问题,所以我猜这里没有问题。
  • 下一步是准备要在我的Hadoop沙箱中运行的代码-问题从这里开始,我可能设置了一些错误。这就是我正在做的:

  • 将SparkSession设置为掌握本地,将spark.sql.warehouse.dir更改为hive.metastore.uris,并将thrift:// localhost:9083(如我在Ambari的Hive confing中所见)设置为WarehouseLocation。
    SparkSession spark = SparkSession
    .builder()
    .appName("Java Spark Hive Example")
    .master("local[*]")
    .config("hive.metastore.uris", "thrift://localhost:9083")
    .enableHiveSupport()
    .getOrCreate();

    然后我替换 spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");

    与我上传kv1.txt的hdfs的路径:
    spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src");

    最后一步是使用pom.xml上的 mvn package制作JAR-它会正确生成,并为我提供 original-spark-examples_2.11-2.3.0-SNAPSHOT.jar

    我将程序集复制到Hadoop沙盒 scp -P 2222 ./target/original-spark-examples_2.11-2.3.0-SNAPSHOT.jar root@sandbox.hortonworks.com:/root
    并使用spark-submit运行代码 /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
    其中返回以下错误:
    [root@sandbox-hdp ~]# /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
    java.lang.ClassNotFoundException: JavaSparkHiveExample
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:739)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    [root@sandbox-hdp ~]#

    ..这里我完全被困住了,可能我错过了一些准备运行代码的步骤,依此类推。

    如果我可以得到一些帮助让此代码在Hadoop沙箱上运行,我会 非常高兴。我能够很好地运行JavaWordCount.java Spark示例,但是有了这个示例,我完全陷入了困境。谢谢 :)

    完成 JavaSparkHiveExample.java:
    /*
    * Licensed to the Apache Software Foundation (ASF) under one or more
    * contributor license agreements. See the NOTICE file distributed with
    * this work for additional information regarding copyright ownership.
    * The ASF licenses this file to You under the Apache License, Version 2.0
    * (the "License"); you may not use this file except in compliance with
    * the License. You may obtain a copy of the License at
    *
    * http://www.apache.org/licenses/LICENSE-2.0
    *
    * Unless required by applicable law or agreed to in writing, software
    * distributed under the License is distributed on an "AS IS" BASIS,
    * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    * See the License for the specific language governing permissions and
    * limitations under the License.
    */
    package org.apache.spark.examples.sql.hive;

    // $example on:spark_hive$
    import java.io.File;
    import java.io.Serializable;
    import java.util.ArrayList;
    import java.util.List;

    import org.apache.spark.api.java.function.MapFunction;
    import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Encoders;
    import org.apache.spark.sql.Row;
    import org.apache.spark.sql.SparkSession;
    // $example off:spark_hive$

    public class JavaSparkHiveExample {

    // $example on:spark_hive$
    public static class Record implements Serializable {
    private int key;
    private String value;

    public int getKey() {
    return key;
    }

    public void setKey(int key) {
    this.key = key;
    }

    public String getValue() {
    return value;
    }

    public void setValue(String value) {
    this.value = value;
    }
    }
    // $example off:spark_hive$

    public static void main(String[] args) {
    // $example on:spark_hive$
    // warehouseLocation points to the default location for managed databases and tables
    String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
    SparkSession spark = SparkSession
    .builder()
    .appName("Java Spark Hive Example")
    .config("spark.sql.warehouse.dir", warehouseLocation)
    .enableHiveSupport()
    .getOrCreate();

    spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive");
    spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");

    // Queries are expressed in HiveQL
    spark.sql("SELECT * FROM src").show();
    // +---+-------+
    // |key| value|
    // +---+-------+
    // |238|val_238|
    // | 86| val_86|
    // |311|val_311|
    // ...

    // Aggregation queries are also supported.
    spark.sql("SELECT COUNT(*) FROM src").show();
    // +--------+
    // |count(1)|
    // +--------+
    // | 500 |
    // +--------+

    // The results of SQL queries are themselves DataFrames and support all normal functions.
    Dataset<Row> sqlDF = spark.sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key");

    // The items in DataFrames are of type Row, which lets you to access each column by ordinal.
    Dataset<String> stringsDS = sqlDF.map(
    (MapFunction<Row, String>) row -> "Key: " + row.get(0) + ", Value: " + row.get(1),
    Encoders.STRING());
    stringsDS.show();
    // +--------------------+
    // | value|
    // +--------------------+
    // |Key: 0, Value: val_0|
    // |Key: 0, Value: val_0|
    // |Key: 0, Value: val_0|
    // ...

    // You can also use DataFrames to create temporary views within a SparkSession.
    List<Record> records = new ArrayList<>();
    for (int key = 1; key < 100; key++) {
    Record record = new Record();
    record.setKey(key);
    record.setValue("val_" + key);
    records.add(record);
    }
    Dataset<Row> recordsDF = spark.createDataFrame(records, Record.class);
    recordsDF.createOrReplaceTempView("records");

    // Queries can then join DataFrames data with data stored in Hive.
    spark.sql("SELECT * FROM records r JOIN src s ON r.key = s.key").show();
    // +---+------+---+------+
    // |key| value|key| value|
    // +---+------+---+------+
    // | 2| val_2| 2| val_2|
    // | 2| val_2| 2| val_2|
    // | 4| val_4| 4| val_4|
    // ...
    // $example off:spark_hive$

    spark.stop();
    }
    }

    最佳答案

    类名始终需要完全限定。
    --class org.apache.spark.examples.sql.hive.JavaSparkHiveExample

    spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src"); cannot read from the hdfs, how could I solve this



    很少的选择
  • 删除LOCAL ...该关键字表示不从HDFS中读取。
  • 在Hive的现有文件上构建一个EXTERNAL TABLE,在Spark中查询
  • 使用Spark将文件直接读取到数据集中...不清楚是否需要Hive,但是如果需要,可以使用Spark将数据集写入Hive表
  • 关于java - 无法运行Java Spark Hive示例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47876352/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com