gpt4 book ai didi

java - 如何在配置单元中使用 serde 上传 twitter json 数据?

转载 作者:可可西里 更新时间:2023-11-01 16:11:31 25 4
gpt4 key购买 nike

我正在使用 Twitter 数据加载到配置单元中并对其进行一些查询:我的高音扬声器数据(原始)是:(仅限一种格式)-

{"created_at":"Tue Apr 28 23:28:15 +0000 2015","id":593195048306610176,"id_str":"593195048306610176","text":"Apple watch now has Tinder integration, now you can swipe on the go. This is revolutionary.","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":56632588,"id_str":"56632588","name":"Farmer Mike South","screen_name":"HunterPachell","location":"Bowling Green","url":"http:\/\/pornhub.com","description":"\u0394T\u0394 Bowling Green State University '16 BGSU Lax #2 See my latest highlights on http:\/\/pornhub.com","protected":false,"verified":false,"followers_count":439,"friends_count":997,"listed_count":1,"favourites_count":4548,"statuses_count":3702,"created_at":"Tue Jul 14 07:05:51 +0000 2009","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"050005","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_tile":false,"profile_link_color":"000000","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/56632588\/1420260655","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1430263695204"}

我正在使用外部表配置单元架构:

CREATE  External TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/user/hastimal/tweets';

但是当我做一些简单的事情时
select * from tweets limit 1;
它显示错误:

  hive> select * from tweets limit 1;

OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.json.JSONObject cannot be cast to [Ljava.lang.Object;

我还尝试了所有可用的 google/stackoverflow,我发现了这个 Loading Linkedin JSON response into HIVE
但没有工作。请帮忙......

最佳答案

我认为 jar 有问题。从下面的链接下载 hive-serde jar

files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar

添加jar文件

hive>  add jar  hive-serdes-1.0-SNAPSHOT.jar 

----创建表 --- ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' 像这样

 CREATE  External TABLE tweetsjson (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/hastimal/tweets';

如果不行告诉我

关于java - 如何在配置单元中使用 serde 上传 twitter json 数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29973351/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com