- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
我正在使用 Spark 读取一堆文件,对它们进行详细说明,然后将它们全部保存为序列文件。我想要的是每个分区有 1 个序列文件,所以我这样做了:
SparkConf sparkConf = new SparkConf().setAppName("writingHDFS")
.setMaster("local[2]")
.set("spark.streaming.stopGracefullyOnShutdown", "true");
final JavaSparkContext jsc = new JavaSparkContext(sparkConf);
jsc.hadoopConfiguration().addResource(hdfsConfPath + "hdfs-site.xml");
jsc.hadoopConfiguration().addResource(hdfsConfPath + "core-site.xml");
//JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(5*1000));
JavaPairRDD<String, PortableDataStream> imageByteRDD = jsc.binaryFiles(sourcePath);
if(!imageByteRDD.isEmpty())
imageByteRDD.foreachPartition(new VoidFunction<Iterator<Tuple2<String,PortableDataStream>>>() {
@Override
public void call(Iterator<Tuple2<String, PortableDataStream>> arg0){
throws Exception {
[°°°SOME STUFF°°°]
SequenceFile.Writer writer = SequenceFile.createWriter(
jsc.hadoopConfiguration(),
//here lies the problem: how to pass the hadoopConfiguration I have put inside the Spark Context?
Previously, I created a Configuration for each partition, and it works, but I'm sure there is a much more "sparky way"
有人知道如何在 RDD 闭包内部使用 Hadoop 配置对象吗?
最佳答案
这里的问题是 Hadoop 配置没有被标记为 Serializable
,因此 Spark 不会将它们拉入 RDD。它们被标记为Writable
,因此 Hadoop 的序列化机制可以对它们进行编码和解码,但 Spark 不直接使用它
两个长期修复选项是
您不会对使 Hadoop conf 可序列化提出任何主要反对意见;如果您实现了委托(delegate)给可写 IO 调用的自定义 ser/deser 方法(并且只是遍历所有键/值对)。我是作为 Hadoop 提交者这么说的。
更新:下面是创建可序列化类的代码,该类确实编码 Hadoop 配置的内容。使用 val ser = new ConfSerDeser(hadoopConf)
创建它;在您的 RDD 中将其称为 ser.get()
。
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.hadoop.conf.Configuration
/**
* Class to make Hadoop configurations serializable; uses the
* `Writeable` operations to do this.
* Note: this only serializes the explicitly set values, not any set
* in site/default or other XML resources.
* @param conf
*/
class ConfigSerDeser(var conf: Configuration) extends Serializable {
def this() {
this(new Configuration())
}
def get(): Configuration = conf
private def writeObject (out: java.io.ObjectOutputStream): Unit = {
conf.write(out)
}
private def readObject (in: java.io.ObjectInputStream): Unit = {
conf = new Configuration()
conf.readFields(in)
}
private def readObjectNoData(): Unit = {
conf = new Configuration()
}
}
请注意,对于某些人来说,为所有可写类创建泛型会相对简单;您只需要在构造函数中提供一个类名,并在反序列化期间使用它来实例化可写对象。
关于java - 在 RDD 方法/闭包中使用 SparkContext hadoop 配置,例如 foreachPartition,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38224132/
我在一个简单的 GTK 应用程序中有两个小部件: extern crate gdk; extern crate gtk; use super::desktop_entry::DesktopEntry;
我想做这样的事情: const vegetableColors = {corn: 'yellow', peas: 'green'}; const {*} = vegetableColors; cons
该属性它存储在 gradle 中的什么位置? subprojects { println it.class.name // DefaultProject_Decorated depen
我想在 jQuery 闭包中看到窗口属性“otherName”描述符。但 进入 jQuery 闭包 'otherName' 描述符显示未定义,我认为可能 是 getOwnPropertyDescrip
我曾经看过 Douglas Crockford 的一次演讲,在 javascript 的上下文中,他提到将 secret 存储在闭包中可能很有用。 我想这可以在 Java 中像这样天真地实现: pub
我很难理解 Swift 中闭包中真正发生的事情,希望有人能帮助我理解。 class MyClass { func printWhatever(words: String) {
我有两个 3 表:用户、个人资料、friend_request $my_profile_id变量存储用户个人资料ID的值 $my_user_id = Auth::user()->id; $my_pro
我正在尝试通过使用 GLFW 的包装来学习 Swift GLFW 允许添加错误回调: GLFWAPI GLFWerrorfun glfwSetErrorCallback(GLFWerrorfun cb
我是一名优秀的程序员,十分优秀!