gpt4 book ai didi

datetime - Pig - 无法将 org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix 的匹配函数推断为多个或都不适合

转载 作者:可可西里 更新时间:2023-11-01 14:58:43 25 4
gpt4 key购买 nike

我只是想将 pig 的日期时间格式转换为纪元时间,这样我就可以用时间进行其他计算。下面是我的(部分)脚本:

DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
A = LOAD 's3://hearstlogfiles/google/NetworkBackfillImpressions_271283/2014/09/24/NetworkBackfillImpressions_271283_20140924_00.gz' USING PigStorage(',');
B = LIMIT A 10;
C = FOREACH B GENERATE
(chararray)(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) )) as dt_string:chararray,
DATE_TIME(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) )) AS dt;
D = FOREACH C GENERATE
dt_string,
dt,
ISOToUnix(dt)/1000 as epoch:long;
DUMP D;

当 pig 尝试执行下面的行时,我在它下面得到了错误。我知道我将 dt 转换为正确的格式。

ISOToUnix(dt)/1000 as epoch:long  
Could not infer the matching function for org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix as multiple or none of them fit. Please use an explicit cast.

当我转储 C 时,我得到以下信息。所以我知道 C dt 的格式是正确的。

(2014-09-24 02:53:54,2014-09-24T02:53:54.000Z)  
(2014-09-24 02:57:54,2014-09-24T02:57:54.000Z)
(2014-09-24 03:05:06,2014-09-24T03:05:06.000Z)
(2014-09-24 03:27:30,2014-09-24T03:27:30.000Z)
(2014-09-24 03:37:00,2014-09-24T03:37:00.000Z)
(2014-09-24 03:39:18,2014-09-24T03:39:18.000Z)
(2014-09-24 03:41:24,2014-09-24T03:41:24.000Z)
(2014-09-24 03:43:18,2014-09-24T03:43:18.000Z)
(2014-09-24 03:58:12,2014-09-24T03:58:12.000Z)

请帮忙。

最佳答案

粘贴示例来自 https://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/ISOToUnix.html :

REGISTER /Users/me/commiter/piggybank/java/piggybank.jar ; 
REGISTER /Users/me/commiter/piggybank/java/lib/joda-time-1.6.jar ;
DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
ISOin = LOAD 'test.tsv' USING PigStorage('\t') AS (dt:chararray, dt2:chararray);

DESCRIBE ISOin;
ISOin: {dt: chararray,dt2: chararray}

DUMP ISOin;
(2009-01-07T01:07:01.000Z,2008-02-01T00:00:00.000Z)
(2008-02-06T02:06:02.000Z,2008-02-01T00:00:00.000Z)
(2007-03-05T03:05:03.000Z,2008-02-01T00:00:00.000Z)
...

toUnix = FOREACH ISOin GENERATE ISOToUnix(dt) AS unixTime:long;

DESCRIBE toUnix;
toUnix: {unixTime: long}
DUMP toUnix;
(1231290421000L)
(1202263562000L)
(1173063903000L)
...

如果您注意到,dt(作为参数传递给 ISOToUnix UDF)是字符数组。因此您需要将“dt”列类型转换为字符数组,如下所示:

C = FOREACH B 
GENERATE
(chararray)(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),
SUBSTRING($0, 11,19) )) as dt_string:chararray,
CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) ) AS dt:chararray;

D = FOREACH C
GENERATE
dt_string,
dt,
ISOToUnix((chararray)dt)/1000 as epoch:long;

DUMP D;

希望这对您有所帮助。

关于datetime - Pig - 无法将 org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix 的匹配函数推断为多个或都不适合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26047829/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com