gpt4 book ai didi

hadoop - 计数并压扁 pig

转载 作者:可可西里 更新时间:2023-11-01 16:32:12 26 4
gpt4 key购买 nike

您好,我有这样的数据:

{"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher ": "ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"

{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"

{"user_id": "knuth86a", "type": "Book", "title": "TeX: The Program", "year": "1986", "publisher": "Addison-Wesley", "作者": [{"name":"Donald E. Knuth"}], "source": "DBLP"}...

我想获取发布者、标题,然后对组应用计数,但使用此脚本时出现错误“列需要...”:

books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');

doc = group books by publisher;
res = foreach doc generate group,books.title,count(books.publisher);
DUMP res;

在第二个查询中,我希望有这样的结构:(name,year),title

所以我尝试了这个:

books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');


flat =group books by (generate FLATTEN((authors.name),year);
tab = foreach flat generate group, books.title;
DUMP tab;

但是也行不通...

有什么想法吗?

最佳答案

您在尝试第一个查询时遇到的错误是什么?COUNT 是内置函数,必须全部大写,不能调用 COUNT(group),group 是 Pig 生成的内部标识符。

我在运行您的第一个查询时得到以下结果 -

(学术出版社,{(不等式:多数化理论及其应用。)},1)(Addison-Wesley,{(TeX: The Program)},1)(ACM Press 和 Addison-Wesley,{(现代数据库系统:对象模型、互操作性及其他。)},1)

(name,year),title的预期格式也可以这样实现——

flat = foreach books generate FLATTEN(authors.name) as authorName, year, title;
tab = group flat by (authorName, year);
finaltab = foreach tab generate group, flat.title;

关于hadoop - 计数并压扁 pig ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25496029/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com