gpt4 book ai didi

hadoop - PIG 中的标量投影无效

转载 作者:可可西里 更新时间:2023-11-01 14:51:49 26 4
gpt4 key购买 nike

我在 PIG 中的数据列名为

关键字、campaign_id、日期、时间、display_site、was_clicked、cpc、国家/地区、展示位置

我想做的是找到点击率高的关键字。

所以,我试图理解为什么下面的代码会给我无效的标量投影错误

  grouped = GROUP data BY keyword;
by_keyword = FOREACH grouped
{
clicked = FILTER data BY was_clicked == 1;
total = COUNT(data.keyword);
GENERATE group, ((double)COUNT(clicked) / total) AS ctr;
}

我得到的错误:

37,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<line 59, column 33> Invalid scalar projection: clicked : A column needs to be projected from a relation for it to be used as a scalar
Details at logfile: /home/cloudera/pig_1486224821223.log

如有任何帮助,我们将不胜感激。

编辑:

data = LOAD '/user/cloudera/pig_demo/ad_data.txt' AS (keyword:chararray,campaign_id:chararray,
date:chararray, time:chararray,display_site:chararray, was_clicked:int,
cpc:int, country:chararray, placement:chararray);

记录样本:

tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP

最佳答案

pig 版本 0.15。

输入文件data.txt:

tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP
tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP
tablet C6 5/1/2013 3:47:10 movienet.example.com 0 102 USA TOP
tablet C6 5/1/2013 3:47:10 movienet.example.com 1 102 USA TOP

脚本:

data = LOAD '/path/data.txt' AS (keyword:chararray,campaign_id:chararray,
date:chararray, time:chararray,display_site:chararray, was_clicked:int,
cpc:int, country:chararray, placement:chararray);
grouped = GROUP data BY keyword;
by_keyword = FOREACH grouped
{
clicked = FILTER data BY was_clicked == 1;
total = COUNT(data.keyword);
GENERATE group, ((double)COUNT(clicked) / total) AS ctr;
}
dump by_keyword

给我正确的结果:

(tablet,0.25)

关于hadoop - PIG 中的标量投影无效,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42050189/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com