gpt4 book ai didi

hadoop - 如何从 apache Drill 中查询 hdfs 零件文件

转载 作者:行者123 更新时间:2023-12-02 21:25:48 25 4
gpt4 key购买 nike

我正在尝试从 apache Drill 查询我的 HDFS 文件系统。
我已经成功地能够查询 hive 表、csv 文件,但部分文件不起作用。

hadoop fs -cat BANK_FINAL/2015-11-02/part-r-00000 | head -1

给出结果:

028|S80306432|2015-11-02|BRN-CLG-CHQ 支付给 SILVER ROCK BANDRA CO-OP|485|ZONE SERIAL [ 485]|L|I|MAHARASHTRA STATE CO-OP BANK LTD|3320.0|INWARD CLG|D11528 |SBPRM
select * from dfs.`/user/ituser1/e.csv` limit 10 

工作正常并成功给出结果。

但是当我尝试查询时
select * from dfs.`/user/ituser1/BANK_FINAL/2015-11-02/part-r-00000` limit 10

给出错误:

org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs./user/ituser1/BANK_FINAL/2015-11-02/part-r-00000' not found [Error Id: 6f80392a-51af-4b61-94d8-335b33b0048c on genome-dev13.axs:31010]



Apache 钻 dfs 存储插件json如下:
{
"type": "file",
"enabled": true,
"connection": "hdfs://10.9.1.33:8020/",
"workspaces": {
"root": {
"location": "/",
"writable": true,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"psv"
],
"delimiter": "|"
},
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
"parquet": {
"type": "parquet"
},
"json": {
"type": "json"
},
"avro": {
"type": "avro"
},
"sequencefile": {
"type": "sequencefile",
"extensions": [
"seq"
]
},
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
}
}

最佳答案

Drill 使用文件扩展名来确定文件类型,除了 parquet 文件,它试图从文件中读取一个魔数(Magic Number)。在您的情况下,您需要定义“defaultInputFormat”以指示默认情况下任何没有扩展名的文件都是 CSV 文件。您可以在这里找到更多信息:

https://drill.apache.org/docs/drill-default-input-format/

关于hadoop - 如何从 apache Drill 中查询 hdfs 零件文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36079870/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com