scala - 无法在已停止的 SparkContext 上调用方法-6ren

scala - 无法在已停止的 SparkContext 上调用方法

转载作者：行者123 更新时间：2023-12-02 08:21:15

当我运行以下测试时，它会抛出“无法在已停止的 SparkContext 上调用方法”。可能的问题是我使用 TestSuiteBase和流式 Spark 上下文。在线val gridEvalsRDD = ssc.sparkContext.parallelize(gridEvals)我需要使用 SparkContext我通过 ssc.sparkContext 访问这就是我遇到问题的地方(请参阅下面的警告和错误消息)

class StreamingTest extends TestSuiteBase with BeforeAndAfter {

test("Test 1") {
//...
    val gridEvals = for (initialWeights <- gridParams("initialWeights");
                         stepSize <- gridParams("stepSize");
                         numIterations <- gridParams("numIterations")) yield {
      val lr = new StreamingLinearRegressionWithSGD()
        .setInitialWeights(initialWeights.asInstanceOf[Vector])
        .setStepSize(stepSize.asInstanceOf[Double])
        .setNumIterations(numIterations.asInstanceOf[Int])

      ssc = setupStreams(inputData, (inputDStream: DStream[LabeledPoint]) => {
        lr.trainOn(inputDStream)
        lr.predictOnValues(inputDStream.map(x => (x.label, x.features)))
      })

      val output: Seq[Seq[(Double, Double)]] = runStreams(ssc, numBatches, numBatches)
      val cvRMSE = calculateRMSE(output, nPoints)
      println(s"RMSE = $cvRMSE")
      (initialWeights, stepSize, numIterations, cvRMSE)

    }

     val gridEvalsRDD = ssc.sparkContext.parallelize(gridEvals)

}

}

16/04/27 10:40:17 WARN StreamingContext: StreamingContext has already been stopped 16/04/27 10:40:17 INFO SparkContext: SparkContext already stopped.

Cannot call methods on a stopped SparkContext

更新:

这是基类 TestSuiteBase :

trait TestSuiteBase extends SparkFunSuite with BeforeAndAfter with Logging {

  // Name of the framework for Spark context
  def framework: String = this.getClass.getSimpleName

  // Master for Spark context
  def master: String = "local[2]"

  // Batch duration
  def batchDuration: Duration = Seconds(1)

  // Directory where the checkpoint data will be saved
  lazy val checkpointDir: String = {
    val dir = Utils.createTempDir()
    logDebug(s"checkpointDir: $dir")
    dir.toString
  }

  // Number of partitions of the input parallel collections created for testing
  def numInputPartitions: Int = 2

  // Maximum time to wait before the test times out
  def maxWaitTimeMillis: Int = 10000

  // Whether to use manual clock or not
  def useManualClock: Boolean = true

  // Whether to actually wait in real time before changing manual clock
  def actuallyWait: Boolean = false

  // A SparkConf to use in tests. Can be modified before calling setupStreams to configure things.
  val conf = new SparkConf()
    .setMaster(master)
    .setAppName(framework)

  // Timeout for use in ScalaTest `eventually` blocks
  val eventuallyTimeout: PatienceConfiguration.Timeout = timeout(Span(10, ScalaTestSeconds))

  // Default before function for any streaming test suite. Override this
  // if you want to add your stuff to "before" (i.e., don't call before { } )
  def beforeFunction() {
    if (useManualClock) {
      logInfo("Using manual clock")
      conf.set("spark.streaming.clock", "org.apache.spark.util.ManualClock")
    } else {
      logInfo("Using real clock")
      conf.set("spark.streaming.clock", "org.apache.spark.util.SystemClock")
    }
  }

  // Default after function for any streaming test suite. Override this
  // if you want to add your stuff to "after" (i.e., don't call after { } )
  def afterFunction() {
    System.clearProperty("spark.streaming.clock")
  }

  before(beforeFunction)
  after(afterFunction)

  /**
   * Run a block of code with the given StreamingContext and automatically
   * stop the context when the block completes or when an exception is thrown.
   */
  def withStreamingContext[R](ssc: StreamingContext)(block: StreamingContext => R): R = {
    try {
      block(ssc)
    } finally {
      try {
        ssc.stop(stopSparkContext = true)
      } catch {
        case e: Exception =>
          logError("Error stopping StreamingContext", e)
      }
    }
  }

  /**
   * Run a block of code with the given TestServer and automatically
   * stop the server when the block completes or when an exception is thrown.
   */
  def withTestServer[R](testServer: TestServer)(block: TestServer => R): R = {
    try {
      block(testServer)
    } finally {
      try {
        testServer.stop()
      } catch {
        case e: Exception =>
          logError("Error stopping TestServer", e)
      }
    }
  }

  /**
   * Set up required DStreams to test the DStream operation using the two sequences
   * of input collections.
   */
  def setupStreams[U: ClassTag, V: ClassTag](
      input: Seq[Seq[U]],
      operation: DStream[U] => DStream[V],
      numPartitions: Int = numInputPartitions
    ): StreamingContext = {
    // Create StreamingContext
    val ssc = new StreamingContext(conf, batchDuration)
    if (checkpointDir != null) {
      ssc.checkpoint(checkpointDir)
    }

    // Setup the stream computation
    val inputStream = new TestInputStream(ssc, input, numPartitions)
    val operatedStream = operation(inputStream)
    val outputStream = new TestOutputStreamWithPartitions(operatedStream,
      new ArrayBuffer[Seq[Seq[V]]] with SynchronizedBuffer[Seq[Seq[V]]])
    outputStream.register()
    ssc
  }

  /**
   * Set up required DStreams to test the binary operation using the sequence
   * of input collections.
   */
  def setupStreams[U: ClassTag, V: ClassTag, W: ClassTag](
      input1: Seq[Seq[U]],
      input2: Seq[Seq[V]],
      operation: (DStream[U], DStream[V]) => DStream[W]
    ): StreamingContext = {
    // Create StreamingContext
    val ssc = new StreamingContext(conf, batchDuration)
    if (checkpointDir != null) {
      ssc.checkpoint(checkpointDir)
    }

    // Setup the stream computation
    val inputStream1 = new TestInputStream(ssc, input1, numInputPartitions)
    val inputStream2 = new TestInputStream(ssc, input2, numInputPartitions)
    val operatedStream = operation(inputStream1, inputStream2)
    val outputStream = new TestOutputStreamWithPartitions(operatedStream,
      new ArrayBuffer[Seq[Seq[W]]] with SynchronizedBuffer[Seq[Seq[W]]])
    outputStream.register()
    ssc
  }

  /**
   * Runs the streams set up in `ssc` on manual clock for `numBatches` batches and
   * returns the collected output. It will wait until `numExpectedOutput` number of
   * output data has been collected or timeout (set by `maxWaitTimeMillis`) is reached.
   *
   * Returns a sequence of items for each RDD.
   */
  def runStreams[V: ClassTag](
      ssc: StreamingContext,
      numBatches: Int,
      numExpectedOutput: Int
    ): Seq[Seq[V]] = {
    // Flatten each RDD into a single Seq
    runStreamsWithPartitions(ssc, numBatches, numExpectedOutput).map(_.flatten.toSeq)
  }

  /**
   * Runs the streams set up in `ssc` on manual clock for `numBatches` batches and
   * returns the collected output. It will wait until `numExpectedOutput` number of
   * output data has been collected or timeout (set by `maxWaitTimeMillis`) is reached.
   *
   * Returns a sequence of RDD's. Each RDD is represented as several sequences of items, each
   * representing one partition.
   */
  def runStreamsWithPartitions[V: ClassTag](
      ssc: StreamingContext,
      numBatches: Int,
      numExpectedOutput: Int
    ): Seq[Seq[Seq[V]]] = {
    assert(numBatches > 0, "Number of batches to run stream computation is zero")
    assert(numExpectedOutput > 0, "Number of expected outputs after " + numBatches + " is zero")
    logInfo("numBatches = " + numBatches + ", numExpectedOutput = " + numExpectedOutput)

    // Get the output buffer
    val outputStream = ssc.graph.getOutputStreams.
      filter(_.isInstanceOf[TestOutputStreamWithPartitions[_]]).
      head.asInstanceOf[TestOutputStreamWithPartitions[V]]
    val output = outputStream.output

    try {
      // Start computation
      ssc.start()

      // Advance manual clock
      val clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
      logInfo("Manual clock before advancing = " + clock.getTimeMillis())
      if (actuallyWait) {
        for (i <- 1 to numBatches) {
          logInfo("Actually waiting for " + batchDuration)
          clock.advance(batchDuration.milliseconds)
          Thread.sleep(batchDuration.milliseconds)
        }
      } else {
        clock.advance(numBatches * batchDuration.milliseconds)
      }
      logInfo("Manual clock after advancing = " + clock.getTimeMillis())

      // Wait until expected number of output items have been generated
      val startTime = System.currentTimeMillis()
      while (output.size < numExpectedOutput &&
        System.currentTimeMillis() - startTime < maxWaitTimeMillis) {
        logInfo("output.size = " + output.size + ", numExpectedOutput = " + numExpectedOutput)
        ssc.awaitTerminationOrTimeout(50)
      }
      val timeTaken = System.currentTimeMillis() - startTime
      logInfo("Output generated in " + timeTaken + " milliseconds")
      output.foreach(x => logInfo("[" + x.mkString(",") + "]"))
      assert(timeTaken < maxWaitTimeMillis, "Operation timed out after " + timeTaken + " ms")
      assert(output.size === numExpectedOutput, "Unexpected number of outputs generated")

      Thread.sleep(100) // Give some time for the forgetting old RDDs to complete
    } finally {
      ssc.stop(stopSparkContext = true)
    }
    output
  }

  /**
   * Verify whether the output values after running a DStream operation
   * is same as the expected output values, by comparing the output
   * collections either as lists (order matters) or sets (order does not matter)
   */
  def verifyOutput[V: ClassTag](
      output: Seq[Seq[V]],
      expectedOutput: Seq[Seq[V]],
      useSet: Boolean
    ) {
    logInfo("--------------------------------")
    logInfo("output.size = " + output.size)
    logInfo("output")
    output.foreach(x => logInfo("[" + x.mkString(",") + "]"))
    logInfo("expected output.size = " + expectedOutput.size)
    logInfo("expected output")
    expectedOutput.foreach(x => logInfo("[" + x.mkString(",") + "]"))
    logInfo("--------------------------------")

    // Match the output with the expected output
    for (i <- 0 until output.size) {
      if (useSet) {
        assert(
          output(i).toSet === expectedOutput(i).toSet,
          s"Set comparison failed\n" +
            s"Expected output (${expectedOutput.size} items):\n${expectedOutput.mkString("\n")}\n" +
            s"Generated output (${output.size} items): ${output.mkString("\n")}"
        )
      } else {
        assert(
          output(i).toList === expectedOutput(i).toList,
          s"Ordered list comparison failed\n" +
            s"Expected output (${expectedOutput.size} items):\n${expectedOutput.mkString("\n")}\n" +
            s"Generated output (${output.size} items): ${output.mkString("\n")}"
        )
      }
    }
    logInfo("Output verified successfully")
  }

  /**
   * Test unary DStream operation with a list of inputs, with number of
   * batches to run same as the number of expected output values
   */
  def testOperation[U: ClassTag, V: ClassTag](
      input: Seq[Seq[U]],
      operation: DStream[U] => DStream[V],
      expectedOutput: Seq[Seq[V]],
      useSet: Boolean = false
    ) {
    testOperation[U, V](input, operation, expectedOutput, -1, useSet)
  }

  /**
   * Test unary DStream operation with a list of inputs
   * @param input      Sequence of input collections
   * @param operation  Binary DStream operation to be applied to the 2 inputs
   * @param expectedOutput Sequence of expected output collections
   * @param numBatches Number of batches to run the operation for
   * @param useSet     Compare the output values with the expected output values
   *                   as sets (order matters) or as lists (order does not matter)
   */
  def testOperation[U: ClassTag, V: ClassTag](
      input: Seq[Seq[U]],
      operation: DStream[U] => DStream[V],
      expectedOutput: Seq[Seq[V]],
      numBatches: Int,
      useSet: Boolean
    ) {
    val numBatches_ = if (numBatches > 0) numBatches else expectedOutput.size
    withStreamingContext(setupStreams[U, V](input, operation)) { ssc =>
      val output = runStreams[V](ssc, numBatches_, expectedOutput.size)
      verifyOutput[V](output, expectedOutput, useSet)
    }
  }

  /**
   * Test binary DStream operation with two lists of inputs, with number of
   * batches to run same as the number of expected output values
   */
  def testOperation[U: ClassTag, V: ClassTag, W: ClassTag](
      input1: Seq[Seq[U]],
      input2: Seq[Seq[V]],
      operation: (DStream[U], DStream[V]) => DStream[W],
      expectedOutput: Seq[Seq[W]],
      useSet: Boolean
    ) {
    testOperation[U, V, W](input1, input2, operation, expectedOutput, -1, useSet)
  }

  /**
   * Test binary DStream operation with two lists of inputs
   * @param input1     First sequence of input collections
   * @param input2     Second sequence of input collections
   * @param operation  Binary DStream operation to be applied to the 2 inputs
   * @param expectedOutput Sequence of expected output collections
   * @param numBatches Number of batches to run the operation for
   * @param useSet     Compare the output values with the expected output values
   *                   as sets (order matters) or as lists (order does not matter)
   */
  def testOperation[U: ClassTag, V: ClassTag, W: ClassTag](
      input1: Seq[Seq[U]],
      input2: Seq[Seq[V]],
      operation: (DStream[U], DStream[V]) => DStream[W],
      expectedOutput: Seq[Seq[W]],
      numBatches: Int,
      useSet: Boolean
    ) {
    val numBatches_ = if (numBatches > 0) numBatches else expectedOutput.size
    withStreamingContext(setupStreams[U, V, W](input1, input2, operation)) { ssc =>
      val output = runStreams[W](ssc, numBatches_, expectedOutput.size)
      verifyOutput[W](output, expectedOutput, useSet)
    }
  }
}

最佳答案

这些是您应该检查的几件事-

验证您是否有在 spark-config 中指定的可用资源

搜索 停止()代码库中的关键字并检查它不应该在 sparkcontext

Spark 具有 Spark-UI 组件，您可以在其中查看运行的作业、失败或成功以及其日志。那会告诉你为什么它失败了。

关于scala - 无法在已停止的 SparkContext 上调用方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36884845/

文章推荐： R循环长数据返回最小值和累计值

文章推荐： java - 哪些部分功能应该放在DAO层，哪些部分放在服务层

文章推荐： java - 使用 GPS(位置 API)时出现位置异常

java - 无法 Autowiring
我通过 spring ioc 编写了一些 Rest 应用程序。但我无法解决这个问题。这是我的异常(exception): org.springframework.beans.factory.BeanC
java - 无法@Autowire配置
我对 TestNG、Spring 框架等完全陌生，我正在尝试使用注释 @Value通过 @Configuration 访问配置文件注释。我在这里想要实现的目标是让控制台从配置文件中写出“hi”，通过
无法 malloc 然后转到程序顶部
为此工作了几个小时。我完全被难住了。这是 CS113 的实验室。如果用户在程序(二进制计算器)结束时选择继续，我们需要使用 goto 语句来到达程序的顶部。但是，我们还需要释放所有分配的内存。
无法 avformat_open_input .mp3
我正在尝试使用 ffmpeg 库构建一个小的 C 程序。但是我什至无法使用 avformat_open_input() 打开音频文件设置检查错误代码的函数后，我得到以下输出: Error code:
java - 无法 Autowiring
使用 Spring Initializer 创建一个简单的 Spring boot。我只在可用选项下选择 DevTools。创建项目后，无需对其进行任何更改，即可正常运行程序。现在，当我尝试在项目
macos - 无法 brew 链接qt
所以我只是在 Mac OS X 中通过 brew 安装了 qt。但是它无法链接它。当我尝试运行 brew link qt 或 brew link --overwrite qt 我得到以下信息: ton
git - 无法 pull 或提交
我在提交和 pull 时遇到了问题:在提交的 IDE 中，我看到: warning not all local changes may be shown due to an error: unable
gcc - 无法 grep 特定格式的文本
我跑 man gcc | grep "-L" 我明白了 Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more inf
curl - 无法 CURL 远程文件
我有一段代码，旨在接收任何 URL 并将其从网络上撕下来。到目前为止，它运行良好，直到有人给了它这个 URL: http://www.aspensurgical.com/static/images/a
WireGuard - 无法 ping 服务器或解析域
在过去的 5 个小时里，我一直在尝试在我的服务器上设置 WireGuard，但在完成所有设置后，我无法 ping IP 或解析域。下面是服务器配置 [Interface] Address = 10.
GitLab:无法 fork 我自己的项目
我正在尝试在 GitLab 中 fork 我的一个私有(private)项目，但是当我按下 fork 按钮时，我会收到以下信息: No available namespaces to fork the
javascript - 无法 GET/定义路由
我这里遇到了一些问题。我是 node.js 和 Rest API 的新手，但我正在尝试自学。我制作了 REST API，使用 MongoDB 与我的数据库进行通信，我使用 Postman 来测试我的路
javascript - 无法 AppendChild - 尝试使一个方法在不同的类中附加另一个方法
下面的代码在控制台中给出以下消息: Uncaught DOMException: Failed to execute 'appendChild' on 'Node': The new child el
javascript - 数组被视为对象，无法 NgFor
我正在尝试调用一个新端点来显示数据，我意识到在上一组有效的数据中，它在数据周围用一对额外的“[]”括号进行控制台，我认为这就是问题是，而新端点不会以我使用数据的方式产生它! 这是 NgFor 失败的原
git - 无法 checkout 到无效路径
我正在尝试将我的 Symfony2 应用程序部署到我的 Azure Web 应用程序，但遇到了一些麻烦。推送到远程时，我在终端中收到以下消息 remote: Updating branch 'mas
docker - Minikube具有IP-无法 curl
Minikube已启动并正在运行，没有任何错误，但是我无法 curl IP。我在这里遵循:https://docs.traefik.io/user-guide/kubernetes/，似乎没有提到关闭
linux - 无法 docker 组成任何项目
每当我尝试docker组成任何项目时，都会出现以下错误。我尝试过有和没有sudo 我在这台机器上只有这个问题。我可以在Mac和Amazon WorkSpace上运行相同的容器。 (myslabs)
python - 无法 pip 安装手电筒
我正在尝试 pip install stanza 并收到此消息: ERROR: No matching distribution found for torch>=1.3.0 (from stanza
kubernetes 无法 ping 通其他服务
DNS 解析看起来不错，但我无法 ping 我的服务。可能是什么原因？来自集群中的另一个 Pod: $ ping backend PING backend.default.svc.cluster.l
spring - 无法 Autowiring 字段
我正在使用Hibernate 4 + Spring MVC 4当我开始 Apache Tomcat Server 8我收到此错误: Error creating bean with name 'wel

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

scala - 无法在已停止的 SparkContext 上调用方法