gpt4 book ai didi

hadoop - 如何在 pig 的不同表中划分数字

转载 作者:行者123 更新时间:2023-12-02 21:25:16 25 4
gpt4 key购买 nike

我正在尝试联接两个表,并将一个表中的数字除以另一表中的数字。我曾尝试在原始数据库中执行此操作,并生成一个具有相同值的新表,但是两次都遇到相同的错误,这让我更加困惑。

--get the data 
lines = LOAD '/historicaldata.csv' USING PigStorage(' ') AS (ticker:chararray, date:long, open:long, high:long, low:long, close:long, volume:long);

--limit it between the dates we want
specDates = FILTER lines BY (date<=20000103 and date>=19900101);

--sort by ticker symbol
companies = GROUP specDates BY ticker;

--sort DESC and get the top to get the ending date
sorted_end = FOREACH companies {
sorted1 = ORDER specDates BY date DESC;
endDate = LIMIT sorted1 1;
GENERATE endDate.ticker AS ticker, endDate.open AS open, endDate.close AS close;
}

--sort ASC and get the top to get the starting date
sorted_begin = FOREACH companies {
sorted2 = ORDER specDates BY date ASC;
startDate = LIMIT sorted2 1;
GENERATE startDate.ticker AS ticker, startDate.open AS open, startDate.close AS close;
}

joined = JOIN sorted_end BY ticker, sorted_begin BY ticker;
final = FOREACH joined GENERATE sorted_end::ticker as ticker, sorted_begin::open as open, sorted_end::close as close;
final2 = FOREACH final GENERATE ticker as ticker, (float)(close/open) as growth_factor;

我不断收到的错误是:
(Name: Divide Type: null Uid: null)incompatible types in Divide Operator left hand side:bag :tuple(close:float)  right hand side:bag :tuple(open:float) 

两者都是浮点数,因此我不确定为什么它们是“不兼容类型”,除了它们来自不同的包装袋之外,但是将它们添加到“final”并尝试从那里进行是不起作用的。

数据格式为:
AA,20140131,11.60,11.80,11.45,11.48,33014100
AA,20140130,12.05,12.07,11.83,11.92,23223500
AA,20140129,11.64,12.23,11.58,11.96,44433000

每个条目都包括所有列,并且格式正确,非零数字

最佳答案

根据您的查询,我尝试在系统上创建一个虚拟表并生成结果。我没有发现问题,除法操作已成功完成。 PFB我在Pig上触发的一些示例查询:

A = LOAD '/home/training/716391/pig/pigdata.csv' USING PigStorage(',') as (ID:INT, name:CHARARRAY, GPC:FLOAT)
B = LOAD '/home/training/716391/pig/pigdata2.csv' USING PigStorage(',') as (ID:INT, name:CHARARRAY, GPC:FLOAT)
C = join A by ID, B by ID
D = FOREACH C generate A::ID as IDA, A::name as NAMEA, A::GPC as GPCA, B::ID as IDB, B::name as NAMEB, B::GPC as GPCB;
E = FOREACH D GENERATE IDA, (FLOAT)(GPCA/GPCB) AS VALUE;

如果您的情况下的除数值不为Null值或为0,能否请您确认?

您能否共享sorted_end和sorted_begin的加载语句?

关于hadoop - 如何在 pig 的不同表中划分数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36274797/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com