gpt4 book ai didi

hadoop - 亚马逊 EMR 和 Hive : Getting a "java.io.IOException: Not a file" exception when loading subdirectories to an external table

转载 作者:可可西里 更新时间:2023-11-01 14:42:38 26 4
gpt4 key购买 nike

我正在使用 Amazon EMR。我在 s3 中有一些日志数据,都在同一个桶中,但在不同的子目录下喜欢:

"s3://bucketname/2014/08/01/abc/file1.bz"
"s3://bucketname/2014/08/01/abc/file2.bz"
"s3://bucketname/2014/08/01/xyz/file1.bz"
"s3://bucketname/2014/08/01/xyz/file3.bz"

我正在使用:

Set hive.mapred.supports.subdirectories=true;
Set mapred.input.dir.recursive=true;

尝试从“s3://bucketname/2014/08/”加载所有数据时:

CREATE EXTERNAL TABLE table1(id string, at string, 
custom struct<param1:string, param2:string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucketname/2014/08/';

作为返回,我得到:

OK
Time taken: 0.169 seconds

尝试查询表时:

SELECT * FROM table1 LIMIT 10;

我得到:

Failed with exception java.io.IOException:java.io.IOException: Not a file: s3://bucketname/2014/08/01

有没有人知道如何解决这个问题?

最佳答案

这是一个特定于 EMR 的问题,这是我从亚马逊支持那里得到的:

Unfortunately Hadoop does not recursively check the subdirectories of Amazon S3 buckets. The input files must be directly in the input directory or Amazon S3 bucket that you specify, not in sub-directories.According to this document ("Are you trying to recursively traverse input directories?")Looks like EMR does not support recursive directory at the moment. We are sorry about the inconvenience.

关于hadoop - 亚马逊 EMR 和 Hive : Getting a "java.io.IOException: Not a file" exception when loading subdirectories to an external table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25708240/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com