gpt4 book ai didi

hadoop - HiveServer2在hdfs/tmp/hive/hive中生成了很多目录

转载 作者:可可西里 更新时间:2023-11-01 15:31:20 27 4
gpt4 key购买 nike

我们使用 Hiveserver2(在 Hortonworks HDP2.2 发行版上)创建新集群。一段时间后,我们在 hdfs 上的/tmp/hive/hive 中有超过 1048576 个目录,因为 hive 服务器在这个位置生成它。

有人遇到过类似的问题吗?来自 hive 服务器的日志:

2015-08-31 06:48:15,828 WARN  [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.heapsize does not exist
2015-08-31 06:48:15,829 WARN [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.server2.enable.impersonation does not exist
2015-08-31 06:48:15,829 WARN [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
2015-08-31 06:48:15,833 INFO [HiveServer2-Handler-Pool: Thread-1104]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(232)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2015-08-31 06:48:15,835 INFO [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff_resources
2015-08-31 06:48:15,883 INFO [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff
2015-08-31 06:48:15,884 INFO [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff
2015-08-31 06:48:16,064 INFO [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/ffd9e5e7-7a4e-472e-b5f1-9c7f8acb0bff/_tmp_space.db
2015-08-31 06:48:16,065 INFO [HiveServer2-Handler-Pool: Thread-1104]: session.SessionState (SessionState.java:start(460)) - No Tez session required at this point. hive.execution.engine=mr.

创建 session 时的 Hiveserver 方法:

 /**
* Create dirs & session paths for this session:
* 1. HDFS scratch dir
* 2. Local scratch dir
* 3. Local downloaded resource dir
* 4. HDFS session path
* 5. Local session path
* 6. HDFS temp table space
* @param userName
* @throws IOException
*/
private void createSessionDirs(String userName) throws IOException {
HiveConf conf = getConf();
Path rootHDFSDirPath = createRootHDFSDir(conf);
// Now create session specific dirs
String scratchDirPermission = HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIRPERMISSION);
Path path;
// 1. HDFS scratch dir
path = new Path(rootHDFSDirPath, userName);
hdfsScratchDirURIString = path.toUri().toString();
createPath(conf, path, scratchDirPermission, false, false);
// 2. Local scratch dir
path = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR));
createPath(conf, path, scratchDirPermission, true, false);
// 3. Download resources dir
path = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR));
createPath(conf, path, scratchDirPermission, true, false);
// Finally, create session paths for this session
// Local & non-local tmp location is configurable. however it is the same across
// all external file systems
String sessionId = getSessionId();
// 4. HDFS session path
hdfsSessionPath = new Path(hdfsScratchDirURIString, sessionId);
createPath(conf, hdfsSessionPath, scratchDirPermission, false, true);
conf.set(HDFS_SESSION_PATH_KEY, hdfsSessionPath.toUri().toString());
// 5. Local session path
localSessionPath = new Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR), sessionId);
createPath(conf, localSessionPath, scratchDirPermission, true, true);
conf.set(LOCAL_SESSION_PATH_KEY, localSessionPath.toUri().toString());
// 6. HDFS temp table space
hdfsTmpTableSpace = new Path(hdfsSessionPath, TMP_PREFIX);
createPath(conf, hdfsTmpTableSpace, scratchDirPermission, false, true);
conf.set(TMP_TABLE_SPACE_KEY, hdfsTmpTableSpace.toUri().toString());
}

最佳答案

我们之前遇到过类似的问题。许多 hive 在运行 Hive 客户端和默认 HDFS 实例的机器上使用临时文件夹。这些文件夹用于存储每个查询的临时/中间数据集,并且通常在查询完成时由 Hive 客户端清理。但是,在 Hive 客户端异常终止的情况下,可能会留下一些数据。配置详情如下:

在 HDFS 集群上,这默认设置为/tmp/hive- 并由配置变量 hive.exec.scratchdir 控制在客户端机器上,这被硬编码到/tmp/请注意,当将数据写入表/分区时,Hive 将首先写入目标表文件系统上的临时位置(使用 hive.exec.scratchdir 作为临时位置),然后将数据移动到目标表。这适用于所有情况——无论表是存储在 HDFS 中(正常情况)还是存储在 S3 甚至 NFS 等文件系统中。

Source

因此您可以使用手动脚本或作业定期清理临时位置,或者您可以使用 cron shell 脚本清理 30 或 60 天的数据

关于hadoop - HiveServer2在hdfs/tmp/hive/hive中生成了很多目录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32306404/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com