gpt4 book ai didi

hadoop - 使用 MultiStorage 将记录存储在单独的文件中

转载 作者:可可西里 更新时间:2023-11-01 15:38:49 26 4
gpt4 key购买 nike

我正在尝试存储一组这样的记录:

2342514224232 | some text here whatever
2342514224234| some more text here whatever

....输出文件夹中的单独文件,如下所示:

输出/2342514224232输出/2342514224234

idstr 的值应该是文件名,文本应该在文件内。这是我的 pig 代码:

REGISTER /home/bytebiscuit/pig-0.11.1/contrib/piggybank/java/piggybank.jar;

A = LOAD 'cleantweets.csv' using PigStorage(',') AS (idstr:chararray, createdat:chararray, text:chararray,followers:int,friends:int,language:chararray,city:chararray,country:chararray,lat:chararray,lon:chararray);

B = FOREACH A GENERATE idstr, text, language, country;

C = FILTER B BY (country == 'United States' OR country == 'United Kingdom') AND language == 'en';

texts = FOREACH C GENERATE idstr,text;

STORE texts INTO 'output/query_results_one' USING org.apache.pig.piggybank.storage.MultiStorage('output/query_results_one', '0');

运行这个 pig 脚本会出现以下错误:

<file pigquery1.pig, line 12, column 0> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.MultiStorage' with arguments '[output/query_results_one, idstr]'

非常感谢任何帮助!

最佳答案

试试这个选项:

 MultiStorage('output/query_results_one', '0', 'none', ',');

关于hadoop - 使用 MultiStorage 将记录存储在单独的文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20550637/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com