gpt4 book ai didi

hadoop - Ambari Hive UTF-8 问题

转载 作者:可可西里 更新时间:2023-11-01 15:27:09 26 4
gpt4 key购买 nike

hive 表中的西里尔符号有问题。安装版本:

ambari-server 2.4.2.0-136
hive-2-5-3-0-37 1.2.1000.2.5.3.0-37
Ubuntu 14.04

问题是什么:

  1. 将语言环境设置为 ru_RU.UTF-8:

    spark@hadoop:~$ locale
    LANG=ru_RU.UTF-8
    LANGUAGE=ru_RU:ru
    LC_CTYPE="ru_RU.UTF-8"
    LC_NUMERIC="ru_RU.UTF-8"
    LC_TIME="ru_RU.UTF-8"
    LC_COLLATE="ru_RU.UTF-8"
    LC_MONETARY="ru_RU.UTF-8"
    LC_MESSAGES="ru_RU.UTF-8"
    LC_PAPER="ru_RU.UTF-8"
    LC_NAME="ru_RU.UTF-8"
    LC_ADDRESS="ru_RU.UTF-8"
    LC_TELEPHONE="ru_RU.UTF-8"
    LC_MEASUREMENT="ru_RU.UTF-8"
    LC_IDENTIFICATION="ru_RU.UTF-8"
    LC_ALL=ru_RU.UTF-8
  2. 连接到配置单元并创建测试表:

    spark@hadoop:~$ beeline -n spark -u jdbc:hive2://spark@hadoop.domain.com:10000/

    Connecting to enter code herejdbc:hive2://spark@hadoop.domain.com:10000/
    Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
    Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive

    0: jdbc:hive2://spark@hadoop.domain.com> CREATE TABLE `test`(`name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'serialization.encoding'='UTF-8');
    No rows affected (0,127 seconds)
  3. 插入西里尔符号:

    0: jdbc:hive2://spark@hadoop.domain.com> insert into test values('привет');

    INFO : Tez session hasn't been created yet. Opening session
    INFO : Dag name: insert into test values('привет')(Stage-1)
    INFO :

    INFO : Status: Running (Executing on YARN cluster with App id application_1490211406894_2481)

    INFO : Map 1: -/-
    INFO : Map 1: 0/1
    INFO : Map 1: 0(+1)/1
    INFO : Map 1: 1/1
    INFO : Loading data to table default.test from hdfs://hadoop.domain.com:8020/apps/hive/warehouse/test/.hive-staging_hive_2017-03-23_13-41-46_215_3133047104896717605-116/-ext-10000
    INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
    No rows affected (6,652 seconds)
  4. 从表中选择:

    0: jdbc:hive2://spark@hadoop.domain.com> select * from test;
    +------------+--+
    | test.name |
    +------------+--+
    | ?@825B |
    +------------+--+
    1 row selected (0,162 seconds)

我在 apache hive 上阅读了很多错误,测试了 unicode、utf-8、utf-16 和一些 isos 编码,但没有运气。

有人可以帮我吗?

谢谢!

最佳答案

Hortonworks 的人帮助我解决了这个问题。看来这是一个错误。

https://community.hortonworks.com/answers/90989/view.html

https://issues.apache.org/jira/browse/HIVE-13983

关于hadoop - Ambari Hive UTF-8 问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42973972/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com