gpt4 book ai didi

java - Hadoop 2.4 无法在 aws s3n 上启 Action 业

转载 作者:行者123 更新时间:2023-12-02 21:46:02 27 4
gpt4 key购买 nike

我使用 S3 native 文件系统在 AWS EC2 上部署了 Hadoop 2.4 作为 HDFS 的替代品。我尝试了几个示例应用程序,都给了我以下堆栈跟踪消息(7 月 24 日的一个较旧的线程卡在那里没有被解决......所以我在这里附上调试信息......):

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount s3n://mybkt/wc/ s3n://mybkt/out

14/08/12 21:57:35 DEBUG util.Shell: setsid exited with exit code 0
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
14/08/12 21:57:36 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
14/08/12 21:57:36 DEBUG security.Groups: Creating new Groups object
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: java.library.path=/home/ubuntu/hadoop-2.4.0/lib
14/08/12 21:57:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
14/08/12 21:57:36 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login commit
14/08/12 21:57:36 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: ubuntu
14/08/12 21:57:36 DEBUG security.UserGroupInformation: UGI loginUser:ubuntu (auth:SIMPLE)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.https-only=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: storage-service.internal-error-retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.default-storage-class=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.server-side-encryption=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.user-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.product-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.requester-pays-buckets-enabled=false
14/08/12 21:57:36 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
14/08/12 21:57:36 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
14/08/12 21:57:37 INFO client.RMProxy: Connecting to ResourceManager at /172.31.20.187:8032
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:130)
14/08/12 21:57:37 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
14/08/12 21:57:37 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
14/08/12 21:57:37 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@7d66036e
14/08/12 21:57:37 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@71cebfd2
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3n
14/08/12 21:57:37 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

这是我的配置文件:

yarn 站点.xml:
    <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>172.31.20.187:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.31.20.187:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.31.20.187:8030</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/ubuntu/hdfs/tmp</value>
</property>

mapred-site.xml:
    <property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.map.memory.mb</name>
<value>640</value>
<description>Larger resource limit for maps.</description>
</property>

<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>640</value>
<description>Larger resource limit for reduces.</description>
</property>

<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>

<property>
<name>mapreduce.jobtracker.address</name>
<value>172.31.20.187:8021</value>
</property>

我还按照此链接配置 AWS S3 的访问控制 (core-site.xml):
https://wiki.apache.org/hadoop/AmazonS3

核心站点.xml:
    <property>
<name>fs.defaultFS</name>
<value>s3n://mybkt</value>
</property>

<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>

<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>123</value>
</property>

<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>456</value>
</property>

我也尝试了 Hadoop v1,结果证明 s3n 文件系统上的 Hadoop1 可以工作。但它似乎不适用于 Hadoop v2。

请帮忙。提前致谢。

最佳答案

认为使用 s3 或任何其他文件系统实现作为 HDFS/Namenode 的替代品不可能这么简单。
用 Tachyon 文件系统尝试了同样的事情,但失败了,请参阅 https://groups.google.com/forum/#!topic/tachyon-users/u4OoBekGigA

认为其他人通过以下方式使其工作:

  • 添加 AbstractFileSystem 实现
  • 修补 Hadoop 以不检查暂存工件的权限

  • 因此,底线是,您可以使用 s3 读取作业的输入和写入输出,但不支持开箱即用地将其用作执行元数据层的 HDFS 替代品!

    关于java - Hadoop 2.4 无法在 aws s3n 上启 Action 业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25275194/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com