作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用Pig中的DIFF()方法找出两个表(源表和目标表)之间的差异,以实现以下目标:
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
cogroupnew= COGROUP sourcenew by ID inner, destnew by ID inner;
diffnew = FOREACH cogroupnew GENERATE DIFF(sourcenew,destnew);
DUMP diffnew;
cogroupextrainsource= COGROUP sourcenew by ID inner, destnew by ID;
filterextrainsource= FILTER cogroupextrainsource BY ID NOT (cogroupnew)
最佳答案
您不需要列名ID旁边的$符号。仅当您不想按名称访问列时,才使用$。
cogroupextrainsource = COGROUP sourcenew by ID inner, destnew by ID;
关于hadoop - pig 的NOT IN功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41952027/
我是一名优秀的程序员,十分优秀!