hadoop - pig : How to remove '::' in the column name-6ren

hadoop - pig : How to remove '::' in the column name

转载作者：可可西里更新时间：2023-11-01 14:43:03

我有一个像下面这样的 pig 关系:

最终= {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}

我正在尝试存储 input_md5 的所有列与 hive 表的关系。喜欢所有input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray不服用 test_1:: type: chararray,test_2::name:chararray

pig 中是否有任何命令只过滤 input_md5 的列？ .类似下面的内容:

STORE= FOREACH FINAL GENERATE all input_md5::type .我知道 pig 有:

FOREACH FINAL GENERATE all input_md5::type as type语法，但我有很多列所以我不能使用 as在我的代码中。

因为当我尝试: STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;

Pig 抛出一个错误:

org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>

提前致谢

最佳答案

已解决此问题，修复如下:

如下所示创建与某些过滤条件的关系:

DUMMY_RELATION= FILTER SOURCE_TABLE BY type== '';(我取了一个名为 type 的列，这可以按表中的任何列进行过滤，重要的是我们需要它的架构)

FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;

(这个新的 DUMMY_RELATIONn 应该放在联合中的第一个)现在您不再有 :: 运算符并且您的列名将匹配配置单元表的列名，前提是您的源表(到 DUMMY_RELATION)和目标表具有相同的列顺序。

感谢我自己:)

关于hadoop - pig : How to remove '::' in the column name，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38902046/