gpt4 book ai didi

java - AWS 超时时来自 HTable 的 MapReduce 输入

转载 作者:行者123 更新时间:2023-12-02 20:58:38 26 4
gpt4 key购买 nike

我在弄清楚如何使用 emr-5.4.0 从 HTable 获取输入来执行简单的 MapReduce 作业时遇到了一些麻烦。
当我在 ERM 上运行时,由于超时而失败。(emr-5.3.0 也失败了)

我做了一堆谷歌搜索以了解如何进行,但找不到任何有用的东西。

我的过程:

  • 我使用Hbase创建了一个EMR集群。版本是:

  • Amazon 2.7.3, Ganglia 3.7.2, HBase 1.3.0, Hive 2.1.1, Hue 3.11.0, Phoenix 4.9.0


  • 根据手册中的示例:http://hbase.apache.org/book.html#mapreduce.example ,写我的工作喜欢:

    公共(public)类 TableMapTest3 {
    // TableMapper
    public static class MyMapper extends TableMapper<Text, Text> {

    protected void map(ImmutableBytesWritable key, Result inputValue, Context context)
    throws IOException, InterruptedException {
    String keyS = new String(key.get(), "UTF-8");
    String value = new String(inputValue.getValue(Bytes.toBytes("contents"), Bytes.toBytes("name")), "UTF-8");
    System.out.println("TokenizerMapper :" + value);
    context.write(new Text(keyS), new Text(value));
    }
    }

    public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    System.out.println("url:" + conf.get("fs.defaultFS"));
    System.out.println("hbase.zookeeper.quorum:" + conf.get("hbase.zookeeper.quorum"));
    Connection conn = ConnectionFactory.createConnection(conf);

    Admin admin = conn.getAdmin();
    String tableName = "TableMapTest";
    TableName tablename = TableName.valueOf(tableName);

    Table hTable = null;
    // check table exists
    if (admin.tableExists(tablename)) {
    System.out.println(tablename + " table existed...");
    hTable = conn.getTable(tablename);
    ResultScanner resultScanner = hTable.getScanner(new Scan());
    for (Result result : resultScanner) {
    Delete delete = new Delete(result.getRow());
    hTable.delete(delete);
    }
    } else {
    HTableDescriptor tableDesc = new HTableDescriptor(tablename);
    tableDesc.addFamily(new HColumnDescriptor("contents"));
    admin.createTable(tableDesc);
    System.out.println(tablename + " table created...");
    hTable = conn.getTable(tablename);
    }

    // insert data
    for (int i = 0; i < 20; i++) {
    Put put = new Put(Bytes.toBytes(String.valueOf(i)));
    put.addColumn(Bytes.toBytes("contents"), Bytes.toBytes("name"), Bytes.toBytes("value" + i));
    hTable.put(put);
    }
    hTable.close();

    // Hadoop
    Job job = Job.getInstance(conf, TableMapTest3.class.getSimpleName());
    job.setJarByClass(TableMapTest3.class);
    job.setOutputFormatClass(NullOutputFormat.class);

    Scan scan = new Scan();
    TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class, Text.class, Text.class, job);

    System.out.println("TableMapTest result:" + job.waitForCompletion(true));
    }

    }
  • 将我的源打包到 jar 并上传到集群。然后我在 master 上 ssh 并运行我的工作:

    hadoop jar zz-0.0.1.jar com.ziki.zz.TableMapTest3


  • 我收到以下消息:
    url:hdfs://ip-xxx.ap-northeast-1.compute.internal:8020
    hbase.zookeeper.quorum:localhost
    TableMapTest table created...
    17/05/05 01:31:23 INFO impl.TimelineClientImpl: Timeline service address: http://ip-xxx.ap-northeast-1.compute.internal:8188/ws/v1/timeline/
    17/05/05 01:31:23 INFO client.RMProxy: Connecting to ResourceManager at ip-xxx.ap-northeast-1.compute.internal/172.31.4.228:8032
    17/05/05 01:31:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    17/05/05 01:31:31 INFO mapreduce.JobSubmitter: number of splits:1
    17/05/05 01:31:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493947058255_0001
    17/05/05 01:31:33 INFO impl.YarnClientImpl: Submitted application application_1493947058255_0001
    17/05/05 01:31:34 INFO mapreduce.Job: The url to track the job: http://ip-xxx.ap-northeast-1.compute.internal:20888/proxy/application_1493947058255_0001/
    17/05/05 01:31:34 INFO mapreduce.Job: Running job: job_1493947058255_0001
    17/05/05 01:31:57 INFO mapreduce.Job: Job job_1493947058255_0001 running in uber mode : false
    17/05/05 01:31:57 INFO mapreduce.Job: map 0% reduce 0%

    一口井之后,我得到了错误:
    17/05/05 01:42:26 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_0, Status : FAILED
    AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs
    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143

    17/05/05 01:52:56 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_1, Status : FAILED
    AttemptID:attempt_1493947058255_0001_m_000000_1 Timed out after 600 secs
    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143

    和一些系统日志
    2017-05-05 01:31:59,664 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1493947058255_0001_m_000000 Task Transitioned from SCHEDULED to RUNNING
    2017-05-05 01:32:08,168 INFO [Socket Reader #1 for port 33348] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1493947058255_0001 (auth:SIMPLE)
    2017-05-05 01:32:08,227 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1493947058255_0001_m_000002 asked for a task
    2017-05-05 01:32:08,231 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1493947058255_0001_m_000002 given task: attempt_1493947058255_0001_m_000000_0
    2017-05-05 01:42:25,382 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs
    2017-05-05 01:42:25,389 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
    2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1493947058255_0001_01_000002 taskAttempt attempt_1493947058255_0001_m_000000_0
    2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1493947058255_0001_m_000000_0
    2017-05-05 01:42:25,394 INFO [ContainerLauncher #1] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : ip-xxx.ap-northeast-1.compute.internal:8041
    2017-05-05 01:42:25,457 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP
    2017-05-05 01:42:25,458 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT
    2017-05-05 01:42:25,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED
    2017-05-05 01:42:25,495 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved ip-xxx.ap-northeast-1.compute.internal to /default-rack
    2017-05-05 01:42:25,500 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node ip-xxx.ap-northeast-1.compute.internal
    2017-05-05 01:42:25,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED
    2017-05-05 01:42:25,503 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1493947058255_0001_m_000000_1 to list of failed maps
    2017-05-05 01:42:25,557 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:3 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0
    2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1493947058255_0001: ask=1 release= 0 newContainers=0 finishedContainers=1 resourcelimit=<memory:1024, vCores:1> knownNMs=2
    2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1493947058255_0001_01_000002
    2017-05-05 01:42:25,583 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143

    我只是使用默认设置并运行一个简单的作业。为什么会发生这些错误?
    如果我遗漏了什么,请告诉我!
    无论如何,感谢您的帮助 - 不胜感激!

    最佳答案

    我找到了答案:here

    你不能使用 HConfiguration(因为它默认为 localhost quorum)你需要做的是使用亚马逊为你设​​置的配置(位于/etc/hbase/conf/hbase-site.xml)

    连接代码如下所示:

        Configuration conf = new Configuration();
    String hbaseSite = "/etc/hbase/conf/hbase-site.xml";
    conf.addResource(new File(hbaseSite).toURI().toURL());
    HBaseAdmin.checkHBaseAvailable(conf);

    关于java - AWS 超时时来自 HTable 的 MapReduce 输入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43796558/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com