gpt4 book ai didi

java - Hive 中的 NULLIF 在某些版本中是否存在一些已知的实现问题?

转载 作者:行者123 更新时间:2023-12-01 18:31:26 24 4
gpt4 key购买 nike

我使用的是 EMR 5.19 Hive 2.3.3,但我遇到 Nullif 问题,无法从 Java 字符串转换为 Hadoop 文本,反之亦然。来源是AWS的CloudTrail Serde,看起来写得很扎实。该问题似乎来自内置 NULLIF UDF,您可以在错误消息中看到:

我正在测试正则表达式提取的结果是否为空字符串,如果是,我想要一个空字符串,所以我的列看起来有点像 NULLIF(REGEXP_EXTRACT(key,'([^\/] +)(\/\d+)?(\/.*)', 1), '') AS key_prefix 但我收到如下错误:

2020-02-11 11:06:34,034 INFO [IPC Server handler 26 on 43627] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1574116917806_1754132_r_000008_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating NULLIF(regexp_extract(_col2, '(^[^\/]*)\/(\d\/)?([^\/][^\/]+)', 1),'')
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:820)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:834)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:837)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:938)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:264)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
... 7 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.comparePrimitiveObjects(PrimitiveObjectInspectorUtils.java:421)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:93)

最佳答案

也许不能直接回答您的问题,但希望这会有所帮助。

  1. regexp_extract 返回空字符串 '' 如果正则表达式不匹配,则仅当源字符串为 null< 时才返回 null/。所以,这里使用 NULLIF 看起来不正确
  2. 使用双反斜杠屏蔽 Hive 正则表达式中的特殊字符,例如 \\d
  3. / - 不是特殊字符,不需要转义/屏蔽。

我建议这样的宏:

CREATE TEMPORARY MACRO normalize_null(s string) CASE WHEN s!='' THEN s END;

它将空字符串转换为 null、NULL 和其他所有内容。

关于java - Hive 中的 NULLIF 在某些版本中是否存在一些已知的实现问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60161108/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com