gpt4 book ai didi

apache-spark - 是否有类似于 Hadoop Streaming 的 Apache Spark 对应物?

转载 作者:可可西里 更新时间:2023-11-01 16:37:06 25 4
gpt4 key购买 nike

我有一些高度自定义的处理逻辑,我想用 C++ 实现。 Hadoop Streaming使我能够将 C++ 编码的逻辑集成到 MapReduce 处理管道中。我想知道我是否可以用 Apache Spark 做同样的事情。

最佳答案

最接近(但不完全等价)的解决方案是 RDD.pipe方法:

Return an RDD created by piping elements to a forked external process. The resulting RDD is computed by executing the given process once per partition. All elements of each input partition are written to a process's stdin as lines of input separated by a newline. The resulting partition consists of the process's stdout output, with each line of stdout resulting in one element of the output partition. A process is invoked even for empty partitions.

The print behavior can be customized by providing two functions.

Spark test suite提供了许多使用示例。

关于apache-spark - 是否有类似于 Hadoop Streaming 的 Apache Spark 对应物?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49881518/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com