gpt4 book ai didi

hadoop - 如何调整从(非EMR)Hive到S3的数据上传?

转载 作者:行者123 更新时间:2023-12-02 21:19:23 25 4
gpt4 key购买 nike

我想将数据从裸机hadoop集群上的配置单元表复制到Amazon S3存储桶。

我了解我可以执行以下操作:

hive> create external table my_table
> (
> `column1` string,
> `column2` string,
....
> `columnX` string)
> row format delimited fields terminated by ','
> lines terminated by '\n'
> stored as textfile
> location 's3n://my_bucket/my_folder_path/';

hive> insert into table my_table select * from source_db.source_table;

它适用于少量数据。但是,如果我尝试使用更大的数据集,则会遇到堆栈跟踪(如下面的堆栈)错误。

我正在寻找有关调整此过程或其他选项的方法的帮助。

提前致谢。
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: n must be positive
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:102)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:117)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 9 more

最佳答案

hive 清楚地告诉您问题所在。看这两行:

  • 处理行
  • 时出现Hive运行时错误
  • IllegalArgumentException:n必须为正
    因此,小数据集与大数据集不是问题。相反,大型数据集包含一些行,而Hive无法处理这些行。

  • 但是,很难用您发布的信息来查明确切的问题。我建议您将大型​​数据集分解为较小的块,并尝试缩小问题范围。

    关于hadoop - 如何调整从(非EMR)Hive到S3的数据上传?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37842229/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com