gpt4 book ai didi

foreach - 通过在 PIG 中的同一 block 内计算的条件值在 FOREACH block 内进行过滤

转载 作者:行者123 更新时间:2023-12-05 01:04:55 27 4
gpt4 key购买 nike

我有一个日志数据集,我需要在发生故障后过滤掉设备的所有日志条目(操作 = 2)。

在这个例子中:

EquipId, ScvId, Action, TimeStamp
Ag,01,1,14-01-01 0:00:01
Ag,01,1,14-01-02 0:00:01
Ag,01,2,14-01-03 0:00:01
Ag,01,1,14-01-04 0:00:01
Ag,01,1,14-01-05 0:00:01
Ag,01,2,14-01-06 0:00:01
Ag,01,1,14-01-07 0:00:01
Ra,01,1,14-01-01 0:00:01
Ra,01,1,14-01-02 0:00:01
Ra,01,1,14-01-03 0:00:01
Ra,01,2,14-01-04 0:00:01
Fe,01,2,14-01-03 0:00:01
Fe,01,1,14-01-03 0:00:02
Fe,01,1,14-01-04 0:00:01
Lu,01,1,14-01-05 0:00:01
Lu,01,1,14-01-04 0:00:01
Lu,01,1,14-01-05 0:00:01

预期的输出将是
Ag,01,1,14-01-01 0:00:01
Ag,01,1,14-01-02 0:00:01
Ag,01,2,14-01-03 0:00:01
Ra,01,1,14-01-01 0:00:01
Ra,01,1,14-01-02 0:00:01
Ra,01,1,14-01-03 0:00:01
Ra,01,2,14-01-04 0:00:01
Fe,01,2,14-01-03 0:00:01
Lu,01,1,14-01-05 0:00:01
Lu,01,1,14-01-04 0:00:01
Lu,01,1,14-01-05 0:00:01

我试图在这样的单个 FOREACH 块中对其进行编程:
rawData = LOAD './test.csv'  USING PigStorage(',') AS (equipId:chararray, svcId:chararray, action:chararray, date:chararray);

equipDataGrp = GROUP rawData BY equipId;

minFail = FOREACH equipDataGrp {

actionFail = FILTER rawData BY action == '2';
minFailDate = MIN(actionFail.date);
prevActionsFail = FILTER rawData BY date <= minFailDate;


GENERATE group as equipId, FLATTEN(prevActionsFail.date);

};

我收到以下错误:
2014-03-05 11:08:11,720 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: 
<line 36, column 28> Invalid field reference. Referenced field [date] does not exist in schema: .

如果我将日期硬编码为:
minFail = FOREACH equipDataGrp {

actionFail = FILTER rawData BY action == '2';
minFailDate = MIN(actionFail.date);
prevActionsFail = FILTER rawData BY date == '14-01-03 0:00:01';


GENERATE group as equipId, FLATTEN(prevActionsFail.date);

};

我得到回应:
(Ag,14-01-03 0:00:01)
(Fe,14-01-03 0:00:01)
(Ra,14-01-03 0:00:01)

有什么建议吗?

提前致谢!

最佳答案

您需要计算故障时间并将其分发到设备 ID 的所有记录。然后,您可以过滤时间戳晚于该时间的记录:

rawData = LOAD './test.csv'  USING PigStorage(',') AS (equipId:chararray, svcId:chararray, action:chararray, date:chararray);

equipDataGrp = GROUP rawData BY equipId;

/* Expand out into all records again, appending the earliest failure time */
minFail = FOREACH equipDataGrp {
actionFail = FILTER rawData BY action == '2';
GENERATE FLATTEN(rawData), MIN(actionFail.date) AS failTime;
};

notYetFailed = FOREACH (FILTER minFail BY date <= failTime) GENERATE equipId .. date;

关于foreach - 通过在 PIG 中的同一 block 内计算的条件值在 FOREACH block 内进行过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22196120/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com