gpt4 book ai didi

hadoop - 从S3复制到HDFS时出错

转载 作者:行者123 更新时间:2023-12-02 21:48:42 25 4
gpt4 key购买 nike

我正在尝试将一些文件从S3存储桶复制到EMR群集的HDFS。但是我收到以下错误:

Exception in thread "main" java.lang.RuntimeException: Error running job
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:771)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:580)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://10.87.26.26:9000/tmp/33e4f3b9-d29a-49e8-9706-ea70e07e3ff2/files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:751)
... 9 more

我正在使用的命令是:
./elastic-mapreduce --jobflow  j-12345678 --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3n://my-bucket/data/,--dest,hdfs:///data/in,--srcPattern,xyz01-1-1*ped*' --step-name "Copy input files to HDFS" --wait-for-steps

我尝试运行示例单词计数作业,以检查HDFS是否存在任何问题,但是运行良好。

谁能帮我这个忙吗?如果需要更多信息,请让我知道,我将更新描述。

最佳答案

通常是--srcPattern '<regex>'参数。您还可以使用hadoop fs -cp s3://src/file1.something /my/output/path/测试1个文件并修改您的正则表达式。同样,在任何char-0或多次以.*开头的情况下,都应放松匹配。

知道正则表达式不匹配项是否被记录以及记录在何处将是很棒的。

关于hadoop - 从S3复制到HDFS时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22925225/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com