gpt4 book ai didi

csv - Hive CSV 行分隔符配置

转载 作者:行者123 更新时间:2023-12-04 14:44:19 29 4
gpt4 key购买 nike

使用 Hive 在 CSV 文件上创建外部表时,
您可以使用 Hive 内部 CSV Serde:

...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '...'
TBLPROPERTIES('serialization.null.format'='')

或 OpenCSV Serde:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = " ", "quoteChar" = '"', "escapeChar" = "\\" )

我的问题是,如果我有这样的 CSV 文件:
foo,bar,hello\rworld\rbaz,1\n
foo,bar,bye\rworld\rbaz,2\n
foo,bar,hi\rworld\rbaz,3\n
foo,bar,goodbye\rworld\rbaz,4\n

如何将行尾配置为 \n并忽略 \r - 保持它的领域的一部分?

编辑:

-> 尝试使用 LINES TERMINATED BY '\r\n' 时出现以下错误:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException 3:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\r\n''

最佳答案

您可以使用 LINES TERMINATED BY在您的 create table声明如下:

...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '...'
TBLPROPERTIES('serialization.null.format'='')

关于csv - Hive CSV 行分隔符配置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55103936/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com