gpt4 book ai didi

hadoop - 在 Hive 表中使用 JSON-SerDe

转载 作者:可可西里 更新时间:2023-11-01 14:20:05 30 4
gpt4 key购买 nike

我正在从下面的链接中尝试 JSON-SerDe http://code.google.com/p/hive-json-serde/wiki/GettingStarted .

         CREATE TABLE my_table (field1 string, field2 int, 
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;

我添加了 Json-SerDe jar 作为

          ADD JAR /path-to/hive-json-serde.jar;

并将数据加载为

LOAD DATA LOCAL INPATH  '/home/hduser/pradi/Test.json' INTO TABLE my_table;

并成功加载数据。

但是当查询数据为

从我的表中选择*

我只从表中得到一行

data1 100多个data1 123.001

Test.json 包含

{"field1":"data1","field2":100,"field3":"more data1","field4":123.001} 

{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}

{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}

{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

问题出在哪里?为什么当我查询表时只有一行而不是 4 行。在 /user/hive/warehouse/my_table 中包含所有 4 行!!


hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar

hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
> field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
> "field1"="$.field1",
> "field2"="$.field2",
> "field3"="$.field3",
> "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds

hive> LOAD DATA LOCAL INPATH '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds

hive> select * from my_table;
OK
data1 100 more data1 123.001
Time taken: 0.17 seconds

我已经发布了 test.json 文件的内容。所以你可以看到查询结果只有一行

data1   100     more data1      123.001

我已将 json 文件更改为 employee.json,其中包含

{ “名字”:“迈克”, “姓氏”:“切佩斯基”, “员工编号”:1840192

还更改了表,但是当我查询表时它显示空值

hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar

hive> create EXTERNAL table employees_json (firstName string, lastName string, employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds


hive> load data local inpath '/home/hduser/pradi/employees.json' into table employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds


hive>select * from employees_json;
OK
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Time taken: 0.194 seconds

最佳答案

在有疑问的情况下,如果没有日志(请参阅 Getting Started),很难说出发生了什么。只是一个快速的想法 - 如果它与 WITH SERDEPROPERTIES 一起工作,你能试试吗:

CREATE EXTERNAL TABLE my_table (field1 string, field2 int, 
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
"field1"="$.field1",
"field2"="$.field2",
"field3"="$.field3",
"field4"="$.field4"
);

还有一个fork您可能想尝试一下 ThinkBigAnalytics。

更新:原来 Test.json 中的输入是无效的 JSON,因此记录被折叠。

查看答案 https://stackoverflow.com/a/11707993/396567了解更多详情。

关于hadoop - 在 Hive 表中使用 JSON-SerDe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14705858/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com