gpt4 book ai didi

hadoop - 通过迭代数据包获取计数,但与该字段关联的每个值的条件计数应该不同

转载 作者:可可西里 更新时间:2023-11-01 15:13:22 25 4
gpt4 key购买 nike

Below is the data I have and the schema for the same is- student_name, question_number, actual_result(either - false/Correct)

(b,q1,Correct)
(a,q1,false)
(b,q2,Correct)
(a,q2,false)
(b,q3,false)
(a,q3,Correct)
(b,q4,false)
(a,q4,false)
(b,q5,flase)
(a,q5,false)

What I want is to get the count for each student i.e. a/b for total correct and false answer he/she has made.

最佳答案

对于共享的用例,下面的 pig 脚本就足够了。

pig 脚本:

student_data = LOAD 'student_data.csv' USING PigStorage(',') AS (student_name:chararray, question_number:chararray, actual_result:chararray);
student_data_grp = GROUP student_data BY student_name;
student_correct_answer_data = FOREACH student_data_grp {
answers = student_data.actual_result;
correct_answers = FILTER answers BY actual_result=='Correct';
incorrect_answers = FILTER answers BY actual_result=='false';
GENERATE group AS student_name, COUNT(correct_answers) AS correct_ans_count, COUNT(incorrect_answers) AS incorrect_ans_count ;
};

输入:student_data.csv:

b,q1,Correct
a,q1,false
b,q2,Correct
a,q2,false
b,q3,false
a,q3,Correct
b,q4,false
a,q4,false
b,q5,false
a,q5,false

输出:转储 kpi:

-- schema : (student_name, correct_ans_count, incorrect_ans_count)
(a,1,4)
(b,2,3)

引用:有关嵌套 FOR EACH 的更多详细信息

  1. http://pig.apache.org/docs/r0.12.0/basic.html#foreach
  2. http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach

关于hadoop - 通过迭代数据包获取计数,但与该字段关联的每个值的条件计数应该不同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30829277/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com