作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一个数据集,正在使用以下代码进行重复数据删除:
select session_id, sol_id, id, session_context_code, date
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id, date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
select session_id, sol_id, id, session_context_code, date,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
order by session_id, sol_id, date
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 1:44 Expression not in GROUP BY key 'session_id'
ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:614 missing EOF at 'group' near 'nifi_date'
select session_id, solicit_id, nifi_date,id, session_context_code,count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1 and
session_context_code in ("4","3") and
order by session_id, sol_id, nifi_date
group by session_id, sol_id, nifi_date,id, session_context_code
最佳答案
具有COUNT(*)
的Hive查询以及SELECT
子句中的列应将这些列的末尾分组为GROUP BY。
一些样本:SELECT COUNT(*) FROM employees;
SELECT id, name, COUNT(*) FROM employees GROUP BY id, name;
在您的问题场景中,查询应如下所示,
select session_id, sol_id, id, session_context_code, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id, id, session_context_code
order by session_id, sol_id, date
select session_id, sol_id, count(*) as total
from (
select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
substr(case_id,2,9) as id
from df.t1_data
)undup
where undup.rn =1
GROUP BY session_id, sol_id
order by session_id, sol_id, date;
CREATE TABLE tmp AS
SELECT a, b, count(*)
FROM table1
GROUP BY a,b;
SELECT y.a, y.b, x.c, x.d, x.e, x.f
FROM tmp y, table1 x
WHERE y.a=x.a
AND y.b=x.b;
关于hadoop - hive :如何将总行数输出为变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60606332/
我是一名优秀的程序员,十分优秀!