gpt4 book ai didi

sql - PIG查询可对行和列进行计数,以计数行数

转载 作者:行者123 更新时间:2023-12-02 21:46:03 25 4
gpt4 key购买 nike

尝试创建将产生大量不同值的sql或PIG查询会根据类型产生结果。

换句话说,给定该表:

Type:    Value:
A x
B y
C y
B y
C z
A x
A z
A z
A x
B x
B z
B x
C x

我想得到以下结果:
Type:    x:    y:    z:
A 3 0 2
B 2 2 1
C 1 1 1

此外,作为结果的平均值表也将有所帮助
Type:    x:    y:    z:
A 0.60 0.00 0.40
B 0.40 0.40 0.20
C 0.33 0.33 0.33

编辑4

我是PIG的nooby,但我读了8个不同的堆栈溢出文章。

当我使用此PIG查询时
A = LOAD 'tablex' USING org.apache.hcatalog.pig.HCatLoader();
x = foreach A GENERATE id_orig_h;
xx = distinct x;
y = foreach A GENERATE id_resp_h;
yy = distinct y;
yyy = group yy all;
zz = GROUP A BY (id_orig_h, id_resp_h);
B = CROSS xx, yy;
C = foreach B generate xx::id_orig_h as id_orig_h, yy::id_resp_h as id_resp_h;
D = foreach zz GENERATE flatten (group) as (id_orig_h, id_resp_h), COUNT(A) as count;
E = JOIN C by (id_orig_h, id_resp_h) LEFT OUTER, D BY (id_orig_h, id_resp_h);
F = foreach E generate C::id_orig_h as id_orig_h, C::id_resp_h as id_resp_h, D::count as count;
G = foreach yyy generate 0 as id:chararray, flatten(BagToTuple(yy));
H = group F by id_orig_h;
I = foreach H generate group as id_orig_h, flatten(BagToTuple(F.count)) as count;
dump G;
dump I;

有点作品......

我得到这个:
(0,x,y,z)
(A,3,0,2)
(B,2,2,1)
(C,1,1,1)

我可以将其导入到文本文件中,去掉“(”和“)”并用作CSV格式的第一行。这种工作是 SO SLOW 。我想要一个更好,更快,更干净的方法。如果那里有人知道方法,请告诉我。

最佳答案

我能想到的最好的方法是仅与Oracle一起使用,尽管它不会为您提供每个值的列,但它将为您显示以下数据:

A   x=3,y=3,z=3
B x=4,y=3
C y=3,z=2

当然,如果您有900个值,它将显示:
A  x=3,y=6,...,ff=12 

等等...

我无法添加评论,所以我不能问你甲骨文是否还可以。无论如何,这是可以实现此目的的查询:
SELECT type, values FROM 
(SELECT type, SUBSTR(SYS_CONNECT_BY_PATH(value || '=' || OCC, ','),2) values, seq,
MAX(seq) OVER (partition by type) max
FROM
(SELECT type, value, OCC, ROW_NUMBER () OVER (partition by type ORDER BY type, value) seq
FROM
(SELECT type, value, COUNT(*) OCC
FROM tableName
GROUP BY type, value))
START WITH seq=1
CONNECT by PRIOR
seq+1=seq
AND PRIOR
type=type)
WHERE seq = max;

对于平均值,您需要在所有其余信息之前添加信息,以下是代码:
SELECT * FROM 
(SELECT type,
SUBSTR(SYS_CONNECT_BY_PATH(value || '=' || OCC, ','),2) values,
SUBSTR(SYS_CONNECT_BY_PATH(value || '=' || (OCC / TOT), ','),2) average,
seq, MAX(seq) OVER (partition by type) max
FROM
(SELECT type, value, TOT, OCC, ROW_NUMBER () OVER (partition by type ORDER BY type, value) seq
FROM
(
SELECT type, value, TOT, COUNT(*) OCC
FROM (SELECT type, value, COUNT(*) OVER (partition by type) TOT
FROM tableName)
GROUP BY type, value, TOT
))
START WITH seq=1
CONNECT by PRIOR
seq+1=seq
AND PRIOR
type=type)
WHERE seq = max;

关于sql - PIG查询可对行和列进行计数,以计数行数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25267463/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com