gpt4 book ai didi

hadoop - 从HIVE表中提取单个列

转载 作者:行者123 更新时间:2023-12-02 21:37:03 25 4
gpt4 key购买 nike

以下是来自HIVE表的选择查询:

select *  from test_aviation limit 5;                                                   
OK
2015 1 1 1 4 2015-01-01 AA 19805 AA N787AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10
2015 1 1 2 5 2015-01-02 AA 19805 AA N795AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0850 -10.00 0.00 0.00 -1 0900-0959 15.00 0905 1202 9.00 1230 1211 -19.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 381.00 357.00 1.00 2475.00 10
2015 1 1 3 6 2015-01-03 AA 19805 AA N788AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 15.00 0908 1138 13.00 1230 1151 -39.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 358.00 330.00 1.00 2475.00 10
2015 1 1 4 7 2015-01-04 AA 19805 AA N791AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 14.00 0907 1159 19.00 1230 1218 -12.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 385.00 352.00 1.00 2475.00 10
2015 1 1 5 1 2015-01-05 AA 19805 AA N783AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 27.00 0920 1158 24.00 1230 1222 -8.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 389.00 338.00 1.00 2475.00 10
Time taken: 0.067 seconds, Fetched: 5 row(s)

HIVE表的结构
hive> describe test_aviation;
OK
col_value string
Time taken: 0.221 seconds, Fetched: 1 row(s)

我想将整个表分为不同的列。我编写了如下查询以提取第12列:
SELECT regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 12)  from test_aviation;

输出:
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437067221195_0008, Tracking URL = http://localhost:8088/proxy/application_1437067221195_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1437067221195_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-17 02:46:56,215 Stage-1 map = 0%, reduce = 0%
2015-07-17 02:47:27,650 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1437067221195_0008 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:8088/proxy/application_1437067221195_0008/
Examining task ID: task_1437067221195_0008_m_000000 (and more) from job job_1437067221195_0008

Task with the most failures(4):
-----
Task ID:
task_1437067221195_0008_m_000000

URL:
http://localhost:8088/taskdetails.jsp?jobid=job_1437067221195_0008&tipid=task_1437067221195_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\tAA\t19805\tAA\tN787AA\t1\tJFK\tNew York\t NY\tNY\t36\tNew York\t22\tLAX\tLos Angeles\t CA\tCA\t06\tCalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\tAA\t19805\tAA\tN787AA\t1\tJFK\tNew York\t NY\tNY\t36\tNew York\t22\tLAX\tLos Angeles\t CA\tCA\t06\tCalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer) on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@4def4616 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments {2015 1 1 1 4 2015-01-01 AA 19805 AA N787AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10 :java.lang.String, ^(?:([^,]*),?){1}:java.lang.String, 12:java.lang.Integer} of size 3
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1243)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1219)
... 18 more
Caused by: java.lang.IndexOutOfBoundsException: No group 12
at java.util.regex.Matcher.group(Matcher.java:487)
at org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(UDFRegExpExtract.java:56)
... 23 more

请帮助我从HIVE表中提取不同的列。

最佳答案

试试这个:

select split(col_value,' ')[11] as column_12 from test_aviation;

假设您有空格分隔符。
'\\t' if tab 
'\\|' for pipe...
':'

等等

关于hadoop - 从HIVE表中提取单个列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31464886/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com