gpt4 book ai didi

java - 如何从 Eclipse/Intellij IDE 运行简单的 Spark 应用程序?

转载 作者:搜寻专家 更新时间:2023-10-30 21:34:27 25 4
gpt4 key购买 nike

为了在将任务实际部署到 Hadoop 之前简化在 Hadoop 上运行的 map reduce 任务的开发,我使用一个简单的 map reducer 进行了测试,我写道:

object mapreduce {
import scala.collection.JavaConversions._

val intermediate = new java.util.HashMap[String, java.util.List[Int]]
//> intermediate : java.util.HashMap[String,java.util.List[Int]] = {}
val result = new java.util.ArrayList[Int] //> result : java.util.ArrayList[Int] = []

def emitIntermediate(key: String, value: Int) {
if (!intermediate.containsKey(key)) {
intermediate.put(key, new java.util.ArrayList)
}
intermediate.get(key).add(value)
} //> emitIntermediate: (key: String, value: Int)Unit

def emit(value: Int) {
println("value is " + value)
result.add(value)
} //> emit: (value: Int)Unit

def execute(data: java.util.List[String], mapper: String => Unit, reducer: (String, java.util.List[Int]) => Unit) {

for (line <- data) {
mapper(line)
}

for (keyVal <- intermediate) {
reducer(keyVal._1, intermediate.get(keyVal._1))
}

for (item <- result) {
println(item)
}
} //> execute: (data: java.util.List[String], mapper: String => Unit, reducer: (St
//| ring, java.util.List[Int]) => Unit)Unit

def mapper(record: String) {
var jsonAttributes = com.nebhale.jsonpath.JsonPath.read("$", record, classOf[java.util.ArrayList[String]])
println("jsonAttributes are " + jsonAttributes)
var key = jsonAttributes.get(0)
var value = jsonAttributes.get(1)

println("key is " + key)
var delims = "[ ]+";
var words = value.split(delims);
for (w <- words) {
emitIntermediate(w, 1)
}
} //> mapper: (record: String)Unit

def reducer(key: String, listOfValues: java.util.List[Int]) = {
var total = 0
for (value <- listOfValues) {
total += value;
}

emit(total)
} //> reducer: (key: String, listOfValues: java.util.List[Int])Unit
var dataToProcess = new java.util.ArrayList[String]
//> dataToProcess : java.util.ArrayList[String] = []
dataToProcess.add("[\"test1\" , \"test1 here is another test1 test1 \"]")
//> res0: Boolean = true
dataToProcess.add("[\"test2\" , \"test2 here is another test2 test1 \"]")
//> res1: Boolean = true

execute(dataToProcess, mapper, reducer) //> jsonAttributes are [test1, test1 here is another test1 test1 ]
//| key is test1
//| jsonAttributes are [test2, test2 here is another test2 test1 ]
//| key is test2
//| value is 2
//| value is 2
//| value is 4
//| value is 2
//| value is 2
//| 2
//| 2
//| 4
//| 2
//| 2


for (keyValue <- intermediate) {
println(keyValue._1 + "->"+keyValue._2.size)//> another->2
//| is->2
//| test1->4
//| here->2
//| test2->2
}


}

这让我可以在部署到实际的 Hadoop 集群之前在 Windows 上的 Eclipse IDE 中运行我的 mapreduce 任务。我想为 Spark 执行类似的操作,或者能够在 Eclipse 中编写 Spark 代码以在部署到 Spark 集群之前进行测试。 Spark 可以吗?由于 Spark 在 Hadoop 之上运行,这是否意味着我必须先安装 Hadoop 才能运行 Spark?所以换句话说,我可以只使用 Spark 库运行代码吗? :

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object SimpleApp {
def main(args: Array[String]) {
val logFile = "$YOUR_SPARK_HOME/README.md" // Should be some file on your system
val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}

取自https://spark.apache.org/docs/0.9.0/quick-start.html#a-standalone-app-in-scala

如果是这样,我需要在我的项目中包含哪些 Spark 库?

最佳答案

将以下内容添加到您的 build.sbtlibraryDependencies += "org.apache.spark"%% "spark-core"% "0.9.1" 并确保您的 scalaVersion 已设置(例如 scalaVersion := "2.10.3")

此外,如果您只是在本地运行程序,则可以跳过 SparkContext 的最后两个参数,如下所示 val sc = new SparkContext("local", "Simple App")

最后,Spark 可以在 Hadoop 上运行,但也可以独立运行。请参阅:https://spark.apache.org/docs/0.9.1/spark-standalone.html

关于java - 如何从 Eclipse/Intellij IDE 运行简单的 Spark 应用程序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22639137/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com