gpt4 book ai didi

hadoop - 如何使用配置单元实现预期的输出

转载 作者:可可西里 更新时间:2023-11-01 14:45:52 25 4
gpt4 key购买 nike

1.Table1和Table2是关联的,其中table1 px coo组合的时间条目显示在tabel2中。我需要每个 px coo 组合的最后一次输入。如何使用配置单元实现这个?预期输出如下所示,以供引用。

px1    coo1
px1 coo2
px1 coo3
px2 coo2
px2 coo4
px3 coo3
px4 coo4

表2

id1     2014-01-01 21:23:23,273     px1    coo1
id2 2014-01-01 22:01:22,377 px1 coo1
id3 2014-01-01 22:25:06,196 px1 coo1
id4 2014-01-01 22:51:39,487 px1 coo1
id5 2014-01-01 02:05:57,875 px1 coo2
id6 2014-01-01 02:09:42,675 px1 coo2
id7 2014-01-01 23:19:42,059 px1 coo3
id8 2014-01-01 23:34:51,782 px1 coo3
id9 2014-01-01 06:13:05,531 px2 coo2
id10 2014-01-01 06:27:36,676 px2 coo2
id11 2014-01-01 06:59:43,999 px2 coo2
id12 2014-01-01 09:21:57,325 px3 coo3
id13 2014-01-01 17:19:06,956 px4 coo4
id14 2014-01-01 17:27:05,128 px4 coo4

预期的输出应该是

id4     2014-01-01 22:51:39,487     px1    coo1
id6 2014-01-01 02:09:42,675 px1 coo2
id8 2014-01-01 23:34:51,782 px1 coo3
id11 2014-01-01 06:59:43,999 px2 coo2
id12 2014-01-01 09:21:57,325 px3 coo3
id14 2014-01-01 17:27:05,128 px4 coo4

最佳答案

假设你的 table2,最后一列将与 table2 一致。(我的意思是这里对 table 2 本身进行操作你可以获得结果,因为 pix_id,coo_id 将在 table2 中正确匹配。)如果我的假设是错误的请原谅。

hive (sflow)> desc table2;
OK
col_name data_type comment
id string from deserializer
time_stamp string from deserializer
pix_id string from deserializer
coo_id string from deserializer
Time taken: 0.277 seconds

hive (sflow)>

SELECT t2.id,t2.time_stamp,t2.pix_id,t2.coo_id
FROM table2 t2 JOIN
( SELECT pix_id,coo_id, Max(UNIX_TIMESTAMP(time_stamp)) as max_epoch
FROM table2
GROUP BY pix_id,coo_id) temp
WHERE t2.pix_id=temp.pix_id AND t2.coo_id=temp.coo_id AND UNIX_TIMESTAMP(t2.time_stamp) = max_epoch ;

ps:这里复制完整日志(请注意,我运行的是伪模式hadoop,hive 0.9,2GB RAM):

hive (sflow)> from table2 t2 join (select pix_id,coo_id, Max(UNIX_TIMESTAMP(time_stamp)) as max_epoch from table2 group by pix_id,coo_id) temp
> select t2.id,t2.time_stamp,t2.pix_id,t2.coo_id where t2.pix_id=temp.pix_id and t2.coo_id=temp.coo_id and UNIX_TIMESTAMP(t2.time_stamp) = max_epoch ;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
Total MapReduce CPU Time Spent: 24 seconds 0 msec
OK
id time_stamp pix_id coo_id
id4 2014-01-01 22:51:39,487 px1 coo1
id6 2014-01-01 02:09:42,675 px1 coo2
id8 2014-01-01 23:34:51,782 px1 coo3
id11 2014-01-01 06:59:43,999 px2 coo2
id12 2014-01-01 09:21:57,325 px3 coo3
id14 2014-01-01 17:27:05,128 px4 coo4
Time taken: 145.17 seconds

hive (sflow)>
hive (sflow)> desc table2;
OK
col_name data_type comment
id string from deserializer
time_stamp string from deserializer
pix_id string from deserializer
coo_id string from deserializer
Time taken: 0.277 seconds
hive (sflow)>

关于hadoop - 如何使用配置单元实现预期的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21422560/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com