gpt4 book ai didi

hadoop - Pig-将复杂的关系架构存储在配置单元表中

转载 作者:行者123 更新时间:2023-12-02 20:52:24 24 4
gpt4 key购买 nike

今天是我的交易。好吧,从 hive 中读取了关系之后,我经过两次转换创建了一个关系。事实是,经过几次分析,我想将最终关系存储在Hive中,但我不能。让我们在代码中清楚地看到这一点。

第一个字符串是当我从Hive加载并转换结果时:

july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;  
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31;
july_gr = GROUP jul_cl_fl BY (day,start_station);
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
};

因此,现在,当我尝试存储关系july_result时,由于架构已更改,并且我认为它与Hive不兼容,我无法这样做:

使用org.apache.hive.hcatalog.pig.HCatStorer存储到july_result INTO'poc.july_analysis'();

即使我试图为最终的关系设置特殊的方案,我也没有弄清楚。
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
};

最佳答案

在hortonworks社区中进行了研究之后,我得到了有关如何为 pig 中的组关系定义输出格式的解决方案。我的新代码如下所示:

july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
};

谢谢你们。

关于hadoop - Pig-将复杂的关系架构存储在配置单元表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45602439/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com