gpt4 book ai didi

hadoop - pig :Twitter Sentiment Analysis

转载 作者:可可西里 更新时间:2023-11-01 15:59:29 27 4
gpt4 key购买 nike

我正在尝试实现 Twitter 情绪分析。我需要获取所有正面推文和负面推文并将它们存储在特定的文本文件中。

示例.json

{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "google is a good company", "user_id": 450990391}{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "facebook is a bad company","user_id": 450990391}

dictionary.text 包含所有的肯定词和否定词列表

weaksubj    1   bad     adj     n   negative
strongsubj 1 good adj n positive

pig 脚本:-

tweets = load 'new.json' using JsonLoader('id:chararray,text:chararray,user_id:chararray,created_at:chararray');

dictionary = load 'dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray);

words = foreach tweets generate FLATTEN( TOKENIZE(text) ) AS word,id,text,user_id,created_at;

sentiment = join words by word left outer, dictionary by word;

senti2 = foreach sentiment generate words::id as id,words::created_at as created_at,words::text as text,words::user_id as user_id,dictionary::polarity as polarity;

res = FILTER senti2 BY polarity MATCHES '.*possitive.*';

描述资源:-

res: {id: chararray,created_at: chararray,text: chararray,user_id: chararray,polarity: chararray}

但是当我转储 res 时,我没有看到任何输出,但它执行得很好,没有任何错误。

我在这里做错了什么。

请给我建议。

莫汉V

最佳答案

我在这里看到 2 个错误

  • 1 : 第 2 行 - 当你 DUMP 字典时,你会看到所有的记录在第 1 列中,其余列显示为空。

解决方案:使用 PigStorage() 指定适当的分隔符;

 dictionary = load 'dictionary.text' AS     (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray);

DUMP dictionary;
(weaksubj 1 bad adj n negative,,,,,)
(strongsubj 1 good adj n positive,,,,,)

第二个错误:第 6 行:更正 positive 的拼写!使用类似的东西

res = FILTER senti2 BY UPPER(polarity) MATCHES '.*POSITIVE.*';

关于hadoop - pig :Twitter Sentiment Analysis,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39586045/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com