gpt4 book ai didi

hadoop - 将文件文件配置单元以序列文件格式导出到hdfs

转载 作者:行者123 更新时间:2023-12-02 21:36:47 25 4
gpt4 key购买 nike

我正在尝试执行配置单元查询,并以 SEQUENCE FILE 格式将其输出导出为HDFS。

beeline> show create table test_table;

+--------------------------------------------------------------------------------------+
| createtab_stmt |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`( |
| `XXXXXXXXXXXXXX` bigint, |
| `XXXXXXXXXXXxx` int, |
| `XXXXXXXXX` int, |
| `XXXXXX` int) |
| PARTITIONED BY ( |
| `XXXXXXXX` string, |
| `XXXX` string, |
| `XXXXXXXX` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\u00001' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.SequenceFileInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' |
| LOCATION |
| 'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1437569941') |
+--------------------------------------------------------------------------------------+

这是我尝试导出数据的查询,
beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date' 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;

这是错误,
    Error: Error while compiling statement: FAILED: ParseException line 1:61 
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)

我在这里想念什么吗?

软件版本:
Cloudera hadoop CDH5.3.3,
Apache版本0.13.1。

编辑:
在下面更新了我的临时解决方案。

最佳答案

这是因为配置单元查询默认情况下将使用^作为分隔符。
您可以通过导出到本地文件系统来尝试相同的操作。

beeline> INSERT OVERWRITE LOCAL DIRECTORY '/user/~local directoryname' 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
SELECT * FROM test_table WHERE column=value;

关于hadoop - 将文件文件配置单元以序列文件格式导出到hdfs,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31645847/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com