java - Apache 弗林克 : Custom InputFormat only runs with parallelism of 1-6ren

java - Apache 弗林克 : Custom InputFormat only runs with parallelism of 1

转载作者：行者123 更新时间：2023-11-30 02:13:04

25

4

我正在为 Apache Flink 实现自定义输入格式。我创建了一个返回 3 行的虚拟输入格式。

public class ElasticsearchInputFormat extends GenericInputFormat<Row> {
    @Override
    public void configure(Configuration parameters) {
        System.out.println("configuring");
    }

    @Override
    public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException {
        return cachedStatistics;
    }

    @Override
    public void open(GenericInputSplit split) throws IOException {
        System.out.println("opening: " + split);
        super.open(split);
    }

    @Override
    public void close() throws IOException {
        System.out.println("closing");
        super.close();
    }

    private int a = 0;

    public boolean reachedEnd() throws IOException {
        a++;
        return a > 3;
    }

    public Row nextRecord(Row reuse) throws IOException {
        Row r = new Row(2);
        r.setField(0, "osman");
        r.setField(1, "wow");
        return r;
    }
}

我的示例代码如下:

final ExecutionEnvironment env = ExecutionEnvironment.createCollectionsEnvironment();
env.setParallelism(8);

DataSource<Row> input = env.createInput(new ElasticsearchInputFormat());

input.print();

然而，虽然并行度设置为8，但它打印:

configuring
opening: GenericSplit (0/1)
closing
osman,wow
osman,wow
osman,wow

为什么没有并行化？我想要有多个分割，这样它就可以被其他运算符并行使用。

最佳答案

createCollectionsEnvironment() 返回一个隐式并行度为 1 的特殊环境。来自 Javadocs...

Creates a {@link CollectionEnvironment} that uses Java Collections underneath. This will execute in a single thread in the current JVM. It is very fast but will fail if the data does not fit into memory. parallelism will always be 1. This is useful during implementation and for debugging.

关于java - Apache 弗林克 : Custom InputFormat only runs with parallelism of 1，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49543205/

25

4

0

文章推荐： java - 组、收集器、映射(整数到字符串)、映射(映射到对象)

文章推荐： java - 在构造函数中将 int 转换为 float

文章推荐： java - Maven测试后删除资源

文章推荐： java - Math.随机范围负数

安卓工作室 : emulator is running but not showing up in Run App "choose a running device"
我已经通过 AVD 管理器启动了我的模拟器，一旦它运行，我点击了 run app。我已经等了几分钟，我的正在运行的设备出现在选择一个正在运行的设备中，但窗口始终保持空白。最佳答案您正在运行的项
inno-setup - 创新设置: How to run a code procedure in Run section or before Run section?
我想在安装新数据库之前删除旧数据库，以便为用户更新它。我有以下情况: 在我的 Components 部分中，我为用户提供了一个选项: [Components] Name: "updateDataba
python - 如果模块 'example' 包含函数 'run' 和子模块 'run' ，我可以指望 'from example import run' 总是导入前者吗？
如果我将一个 Python 模块实现为一个目录(即包)，它同时具有顶级函数 run 和子模块 run，我可以指望 from example import run 总是导入函数？根据我的测试，至少在 L
Eclipse每次运行项目都会修改server.xml(Run-->Run on Server)
我在 Eclipse Juno 上使用 Tomcat 7。我使用工作区元数据作为服务器位置(请参阅下面的我的 tomcat 配置)。我也收到了服务器项目在 eclipse [请看下图] 中使用单独
java - run 方法内部的线程状态冲突；为什么线程状态不是 "RUNNING"
我正在做一些测试以了解 java 中的不同线程状态，并且遇到了一些查询。通常，当一个线程被实例化时，它被称为处于 "NEW" 状态，然后当调用它的 start() 方法时，操作系统调度程序获得控制权
jquery - 将应用程序迁移到 Angular 6 : But getting errors while running npm run build --prod. 但命令 npm run build --env=prod 成功运行
当我使用命令 npm run build -- --prod 时，我收到以下错误消息: 属性“PropertyName1”是私有(private)属性，只能在“AppComponent”类中访问 “A
java - 英特尔lij : What's the difference between 'Run' and 'Run...'
我正在尝试将默认的“运行”键盘快捷键更改为 ⌘R。 - 因为我不想每次都伸手去拿触控板，而且我的手指不够长，无法一次执行⌥⇧F10。 “运行”和“运行...”有什么区别？最佳答案 ... 用于菜单中
java - 智能 : Does multiple runs are running independently
我现在不知道如何编写一个合适的方法来测试这种行为。请不要投反对票.. 我现在有一个 java 类负责处理数据并将数据添加到多个数据库。每个数据库都保存相同的数据，但处理方式不同(例如，以不同的插值率进
java - 是否可以通过在 run() 方法中调用 run() 来启动线程？
我知道不应该调用 run 方法来启动新线程执行，但我指的是 this article他们在另一个 run 方法中调用了 runnable.run(); ，这似乎暗示它启动了一个新线程或者根本没有cre
How can I fix the Eclipse error "Unable to execute MI command: -exec-run" (path error) that occurs debugging a CygWin64 app?(如何修复调试CygWin64应用程序时出现的“Unable to Execute MI Command：-exec-Run”(无法执行MI命令：-exec-run)(路径错误)？)
当我尝试在Windows 10/11下使用Eclipse 2023-06调试任何应用程序(甚至是hello.c)时，我总是收到以下错误：。该错误清楚地指示-(错误2)-路径是错误的。。我试图在互联网上
Vue中npm run dev 和 npm run serve区别
在运行vue文件时，需要进行npm操作，但我们发现，有时候用的是npm run serve，而有的时候用的是npm run dev，二者有什么区别在我们运行一些 vue 项目的时候，输入npm ru
google-cloud-run - 即使我的脚本仍在运行，cloud run 也会关闭容器
我想在 cloud run 上运行一个长时间运行的作业。该任务可能执行超过 30 分钟，并且主要发送 API 请求。cloud run 在大约 20 分钟后停止执行，从指标来看，它似乎没有识别出我的任
google-cloud-run - Cloud Run 是否支持服务器发送事件 (SSE)？
我们无法让 SSE 从 Google Cloud Run 上的容器发送。我已经尝试使用一个简单的 SSE 示例( https://github.com/kljensen/node-sse-exampl
haskell - 是否有类似于 `stack run` 的 `cabal run` ？
直到最近，我一直在执行这个美丽来构建 + 运行一个带有堆栈的项目: stack build && .stack-work/install/x86_64-linux/lts-4.1/7.10.3/bin
google-cloud-run - Google Cloud Run 与本地机器相比非常慢
我们有一个小脚本，可以抓取网页(约 17 个条目)，并将它们写入 Firestore 集合。为此，我们在 Google Cloud Run 上部署了一项服务。这段代码的执行需要大约 5 秒 when
docker - Docker:docker run -it容器和docker run -it容器bash有什么区别
我是Docker的新手，我知道一种运行交互式容器的方法如下: $ docker run -it image-name bash 要么 $ docker run -it image-name /bin/
docker - Dockerfile 中的多个 RUN 条目和只有一个 RUN 条目有什么区别？
Dockerfile 中的多个 RUN 条目之间有什么区别，例如: FROM php:5.6-apache RUN docker-php-ext-install mysqli RUN apt upda
google-cloud-run - Google Cloud Run 内存限制是否适用于容器大小？
对于来自文档的云运行内存使用情况 ( https://cloud.google.com/run/docs/configuring/memory-limits ) Cloud Run applicati
Eclipse: "Run as"不显示列表元素(如 "Run as android application")
今天早上我更新了我的 Ubuntu 版本，现在我无法从 eclipse 运行我的应用程序。问题是，当我单击“运行方式”时出现的列表是空的，我无法运行任何内容。我该如何解决这个问题？我能看到的唯一
spring - mvn Spring 启动 :run vs Run
我正在 intelliJ 上使用 livereload 测试 spring-boot-devtools。我有一个简单的 SpringBootApplication，可以正常工作。当我从 maven

首页

博学

6Ren·AI

商城

java - Apache 弗林克 : Custom InputFormat only runs with parallelism of 1