gpt4 book ai didi

java - pig : How to pass relationship to Java UDF as argument?

转载 作者:行者123 更新时间:2023-12-01 10:22:24 26 4
gpt4 key购买 nike

我的pig脚本需要将数据传递给java构造函数:

UPCFIND = LOAD 'testdatabase.item' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (upc:chararray,description:chararray); 
UPCDATA = FOREACH UPCFIND GENERATE upc,description;
DUMP UPCDATA;
//output:
(00001123456789," Table ")
(00000123456789," PICTURE ")

我的 UDF 是:

loading = LOAD '/incoming/files/*' USING com.readingitems.loading.TheLoader(UPCDATA) as
(upc:chararray, description:chararray,

我可以将此UPCDATA传递给我的UDF吗?如果可以,我如何将其放入 HashMap 中,其中up​​c是键,描述是值。这被认为是数组列表还是元组?提前致谢!

现在的问题是将这些数据传递给 java 构造函数:

UPCFIND = LOAD 'testdatabase.item' USING org.apache.hive.hcatalog.pig.HCatLoader() AS (upc:chararray,description:chararray);
UPCDATA = FOREACH UPCFIND GENERATE upc,description;
UPCDATA_SCALAR = GROUP UPCDATA ALL;

loading = LOAD 'files/incoming/*' USING com.readingitems.loading.TheLoader(UPCDATA_SCALAR)

收到错误:

ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : UPCDATA_SCALAR

转储 UPCDATA_SCALAR 会产生正确的结果

The reason why I'm doing this is to load a hive table's data into a Loader function that's parsing files. I need to compare data in the files to the Hive table data in order to make changes and insert into a new table.

我的加载器函数开头为:

public class TheLoader extends LoadFunc {

public TheLoader (DataBag item_master_stream) throws SQLException {

最佳答案

在您的示例中,UPCDATA 是一种关系。为了将它作为参数传递到函数中,您必须 convert it into a scalar.您可以通过以下方式完成此操作:

UPCDATA_SCALAR = GROUP UPCDATA ALL;

在 Java 中,这将表示为 TupleDataBag。您可以阅读更多相关信息 here.

值得记住的是,执行 GROUP ALL 的成本非常高,因此您需要投影出对 UDF 功能不重要的所有列。

关于java - pig : How to pass relationship to Java UDF as argument?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35494243/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com