gpt4 book ai didi

java - 从 HIVE UDF 读取 HDFS 文件 - 执行错误,返回代码 101 FunctionTask。无法初始化类

转载 作者:可可西里 更新时间:2023-11-01 15:01:00 29 4
gpt4 key购买 nike

我们一直在尝试创建一个简单的 Hive UDF 来屏蔽 Hive 表中的某些字段。我们正在使用一个外部文件(放在 HDFS 上)来抓取一段文本,以便对屏蔽过程进行加盐处理。看起来我们一切正常,但是当我们尝试创建外部函数时它抛出错误:

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask

这是我们的 UDF 代码:

package co.company;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.commons.codec.digest.DigestUtils;

@Description(
name = "masker",
value = "_FUNC_(str) - mask a string",
extended = "Example: \n" +
" SELECT masker(column) FROM hive_table; "
)
public class Mask extends UDF {

private static final String arch_clave = "/user/username/filename.dat";
private static String clave = null;

public static String getFirstLine( String arch ) {

try {
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream in = fs.open(new Path(arch));
BufferedReader br = new BufferedReader(new InputStreamReader(in));

String ret = br.readLine();
br.close();
return ret;

} catch (Exception e) {

System.out.println("out: Error Message: " + arch + " exc: " + e.getMessage());
return null;
}
}

public Text evaluate(Text s) {

clave = getFirstLine( arch_clave );

Text to_value = new Text( DigestUtils.shaHex( s + clave) );
return to_value;
}
}

我们正在通过 HUE 的界面上传 jar 文件并创建 UDF(遗憾的是,我们还没有对 Hadoop 集群的控制台访问权限。

在 Hue 的 Hive Interface 上,我们的命令是:

add jar hdfs:///user/my_username/myJar.jar

然后创建我们执行的函数:

CREATE TEMPORARY FUNCTION masker as 'co.company.Mask';

遗憾的是,当我们尝试创建 UDF 时抛出的错误不是很有帮助。这是创建 UDF 的日志。任何帮助是极大的赞赏。非常感谢。

14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.ParseDriver: Parsing command: CREATE TEMPORARY FUNCTION enmascarar as 'co.bancolombia.analitica.Enmascarar'
14/12/10 08:32:15 INFO parse.ParseDriver: Parse Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=parse start=1418218335753 end=1418218335754 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.FunctionSemanticAnalyzer: analyze done
14/12/10 08:32:15 INFO ql.Driver: Semantic Analysis Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1418218335754 end=1418218335757 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=compile start=1418218335753 end=1418218335757 duration=4 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO lockmgr.DummyTxnManager: Creating lock manager of type org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
14/12/10 08:32:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=server1.domain:2181,server2.domain.corp:2181,server3.domain:2181 sessionTimeout=600000 watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher@2ebe4e81
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1418218335760 end=1418218335797 duration=37 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Starting command: CREATE TEMPORARY FUNCTION enmascarar as 'co.company.Mask'
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1418218335760 end=1418218335798 duration=38 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=task.FUNCTION.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR ql.Driver: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.MasK
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1418218335797 end=1418218335800 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager: about to release lock for default
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager: about to release lock for colaboradores
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1418218335800 end=1418218335822 duration=22 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR operation.Operation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147)
at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

最佳答案

此问题已解决,但与代码无关。上面的代码可以很好地从 HIVE UDF 读取 HDFS 中的文件(非常低效,因为每次调用评估函数时它都会读取文件,但它设法读取了文件)。

事实证明,当通过 HUE 创建 Hive UDF 时,您上传 jar,然后创建函数。但是,如果您更改了函数并重新上传了 jar,它仍然保留了之前定义的函数。

我们在 jar 中的另一个包中定义了相同的 UDF 类,删除了 HIVE 中的原始函数并通过 HUE 再次创建函数(使用新类):

add jar hdfs:///user/my_username/myJar2.jar;
drop function if exists masker;
create temporary function masker as 'co.company.otherpackage.Mask';

似乎 HIVE(或 HUE,Thrift?)需要一个错误报告,我仍然需要更好地了解系统的哪一部分有问题。

我希望它对将来的人有所帮助。

关于java - 从 HIVE UDF 读取 HDFS 文件 - 执行错误,返回代码 101 FunctionTask。无法初始化类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27402442/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com