gpt4 book ai didi

hadoop - 在 Hive 中执行自定义 UDF

转载 作者:可可西里 更新时间:2023-11-01 16:14:15 24 4
gpt4 key购买 nike

我的 Hive 中有下表

>describe weblogs;
OK
originatingip string
clientidentity string
userid string
time string
requesttype string
requestpage string
httpprotocolversion string
responsecode int
responsesize int
referrer string
useragent string
Time taken: 1.065 seconds, Fetched: 11 row(s)

我在 Java 中创建了一个 UDF 来映射 IP 地址和地理位置。以下是我的UDF

package com.prithvi.hive.logprocessing.udf.ipgeo;
public class IpgeoHive extends UDF {

Text result = new Text();
String ipCountry, ipCity;

public Text evaluate(Text input) throws IOException {
if(input==null)return null;
URL database_path = getClass().getResource("/GeoLiteCity.dat");
File file;
try {
file = new File(database_path.toURI());
} catch(URISyntaxException e) {
file = new File(database_path.getPath());
}
LookupService cl = new LookupService(file);
Location location = cl.getLocation(input.toString());
if (location != null) {
ipCountry = location.countryName;
ipCity = location.city;
} else {
ipCountry = "Unknown";
ipCity = "Unknown";
}
result.set(ipCountry+"/"+ipCity);
return result;
}
}

通过在 eclipse 中传递虚拟值进行测试时,上述 udf 返回的结果符合预期

构建 jar 文件后,我使用以下命令在我的沙箱中运行它

ADD JAR MapReduce_Examples-0.0.1-SNAPSHOT-jar-with-dependencies.jar;

CREATE TEMPORARY FUNCTION IP2GEO AS 'com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive';

SELECT originatingip, IP2GEO(originatingip) from weblogs limit 10;

但是作业失败并出现以下错误,我不知道如何解决这个问题。非常感谢任何帮助。

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row{"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":"\"GET","requestpage":"/","httpprotocolversion":"HTTP/1.1\"","responsecode":404,"responsesize":1081,"referrer":"\"-\"","useragent":"\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\""}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"originatingip":"25.198.250.35","clientidentity":"-","userid":"-","time":"[2014-07-19T16:05:33Z]","requesttype":"\"GET","requestpage":"/","httpprotocolversion":"HTTP/1.1\"","responsecode":404,"responsesize":1081,"referrer":"\"-\"","useragent":"\"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\""}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(org.apache.hadoop.io.Text) throws java.io.IOException on object com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive@63c0b9c3 of class com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive with arguments {25.198.250.35:org.apache.hadoop.io.Text} of size 1
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1241)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1217)
... 18 more
Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(File.java:418)
at com.prithvi.hive.logprocessing.udf.ipgeo.IpgeoHive.evaluate(IpgeoHive.java:28)
... 23 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec*

最佳答案

错误表明配置单元不知道 java 字符串/文本。

您必须将 java 字符串/文本转换为配置单元字符串。

使用下面的代码。

private JavaStringObjectInspector stringInspector;
stringInspector = PrimitiveObjectInspectorFactory.javaStringObjectInspector;
String ip = stringInspector.getPrimitiveJavaObject(input);

关于hadoop - 在 Hive 中执行自定义 UDF,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25394108/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com