clickhouse - 在 ClickHouse 中填充物化 View 超出了内存限制-6ren

clickhouse - 在 ClickHouse 中填充物化 View 超出了内存限制

转载作者：行者123 更新时间：2023-12-04 17:33:42

24

4

我正在尝试在使用 ReplicatedMergeTree 引擎的表上使用 ReplicatedAggregatingMergeTree 引擎创建实体化 View 。

几百万行后，我得到 DB::Exception: Memory limit (for query) exceeded。有办法解决这个问题吗？

CREATE MATERIALIZED VIEW IF NOT EXISTS shared.aggregated_calls_1h
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/shared/aggregated_calls_1h', '{replica}')
PARTITION BY toRelativeDayNum(retained_until_date)
ORDER BY (
  client_id,
  t,
  is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
    destination_application_ids,
    destination_service_id,
    destination_endpoint_id,
    boundary_application_ids,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
    http_status
)
SETTINGS index_granularity = 8192
POPULATE
AS
SELECT
  client_id,
  toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
  date,
  toDate(retained_until_timestamp / 1000) retained_until_date,
  is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
  destination_application_ids,
    destination_service_id,
  destination_endpoint_id,
    boundary_application_ids,
    http_status,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
  any(destination_endpoint) AS destination_endpoint,
  any(destination_endpoint_type) AS destination_endpoint_type,
  groupUniqArrayArrayState(destination_technologies) AS destination_technologies_state,
  minState(ingestion_time) AS min_ingestion_time_state,
  sumState(batchCount) AS sum_call_count_state,
  sumState(errorCount) AS sum_error_count_state,
  sumState(duration) AS sum_duration_state,
  minState(toUInt64(ceil(duration/batchCount))) AS min_duration_state,
  maxState(toUInt64(ceil(duration/batchCount))) AS max_duration_state,
    quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p25_state,
    quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p50_state,
    quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p75_state,
    quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p90_state,
    quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p95_state,
    quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p98_state,
    quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p99_state,
    quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p25_large_state,
    quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p50_large_state,
    quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p75_large_state,
    quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p90_large_state,
    quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p95_large_state,
    quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p98_large_state,
    quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p99_large_state,
    sumState(minSelfTime) AS sum_min_self_time_state
FROM shared.calls_v2
WHERE sample_type != 'user_selected'
GROUP BY
  client_id,
    t,
    date,
    retained_until_date,
    is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
    destination_application_ids,
    destination_service_id,
    destination_endpoint_id,
    boundary_application_ids,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
    http_status
HAVING destination_endpoint_type != 'INTERNAL'

最佳答案

您可以尝试使用 clickhouse-client 的 --max_memory_usage 选项来增加限制。

--max_memory_usage arg "Maximum memory usage for processing of single query. Zero means unlimited."

https://clickhouse.yandex/docs/en/operations/settings/query_complexity/#settings_max_memory_usage

或者不是填充而是手动将数据复制到表中作为

INSERT INTO .inner.shared.aggregated_calls_1h
SELECT 
  client_id,
  toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
  ...

关于clickhouse - 在 ClickHouse 中填充物化 View 超出了内存限制，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57571242/

24

4

0

文章推荐： python - macOS : PyQt5 conflict with opencv-python

文章推荐： Neo4j Merge 不使用唯一约束索引

文章推荐： sequelize.js - 如何在 sequelizejs 中保存/创建时设置关联？

clickhouse - 通过 clickhouse 命令行连接到远程 clickhouse db
当我尝试通过 clickhouse 命令行连接到远程 clickhouse db 时:$ clickhouse-client -h some_ip.com --port 8123 -u some_us
【Clickhouse】ClickHouse 内部架构介绍
1.概述转载：ClickHouse 内部架构介绍官方原文链接：https://clickhouse.yandex/docs/en/development/architecture/ ClickHo
clickhouse - ClickHouse 耐用吗？
我知道 ClickHouse 没有 ACID ，因此我不希望它有 D可用性 ACID性。但是，问题是，如果服务器崩溃，是否有可能丢失插入内容？最佳答案 CH 不耐用。您可以在硬件自发重新启动时丢失
clickhouse - Clickhouse 中的多个小插件
我在 clickhouse 中有一个事件表(MergeTree)，并且想同时运行很多小插入。然而，服务器变得过载且无响应。此外，一些插入物丢失了。 clickhouse错误日志中有很多记录: 01:4
clickhouse - ClickHouse 中的时间比较
也许我错过了一些简单的事情，但我无法使时间过滤工作。这是我的示例查询: select toTimeZone(ts, 'Etc/GMT+2') as z from (select toDateTime
clickhouse - clickhouse 中同时查询太多
我们的 Clickhouse 服务器在峰值负载下运行小型查询时出现了几个异常: DB::Exception: Too much simultaneous queries. Maximum: 100
clickhouse - ClickHouse 中的时间比较
也许我错过了一些简单的事情，但我无法使时间过滤工作。这是我的示例查询: select toTimeZone(ts, 'Etc/GMT+2') as z from (select toDateTime
【clickhouse】clickhouse 副本与分片副本详解
1.概述转载：ClickHouse 11.副本与分片 1. 副本集群是副本和分片的基础，它将 clickhouse 的服务拓扑由单节点延伸到多个节点。 clickhouse 集群配置很灵活，既可以
【clickhouse】clickhouse 副本与分片分片详解
1.概述转载：【clickhouse】clickhouse 副本与分片分片详解 clickhouse 中每个服务器节点都可以被称为一个 shard（分片）。假设有 N 台服务器，每个服务器上都有
clickhouse - 使用 ClickHouse 折叠重叠时间间隔
我阅读了类似的问题，可以通过使用窗口函数使其工作，但是，由于 ClickHouse 似乎不支持它们，我正在寻找替代解决方案。给定像 (1, 5), (2, 3), (3, 8), (10, 15)
clickhouse - 有没有更好的方法来跨 clickhouse 集群查询系统表？
我们有一个适度的 clickhouse 集群，大约 30 个节点，并希望收集它的使用统计信息。我们希望使用针对系统表的预定查询来做到这一点，但使用普通查询只能获取您碰巧连接到的一个节点的信息，并且创建
clickhouse - ClickHouse 新手，无法创建本地主机
我是 Clickhouse 的新手，正在尝试入门。我已经安装了能够在我的计算机(ubuntu 16.04)上使用它所需的所有软件包，但是当我使用 clickhouse-client 命令时，我得到以下
clickhouse - 如何显示 ClickHouse 数据库中的表正在使用什么引擎？
是否有任何命令/SQL 可以显示 ClickHouse 数据库中的表正在使用什么引擎？ create table t (id UInt16, name String) ENGINE = Memory;
clickhouse - 为什么 ClickHouse 客户端返回多个表？
我对 Clickhouse 很陌生，我的第一次尝试似乎总是为 SELECT 生成这种输出: :) select * from test SELECT * FROM test ┌─s───┬───i─┐
clickhouse - 如何为 Clickhouse 设置管理员帐户？
我在 Windows 主机上的 docker 容器中运行 Clickhouse。我尝试创建一个帐户以使其成为管理员帐户。看起来默认用户没有创建其他帐户的权限。如何解决此错误并创建管理员帐户？ do
clickhouse - 更改 Clickhouse 中表中的列名称
有什么方法可以更改表并更改 clickhouse 中的列名称吗？我只发现更改了表名称，但没有以直接的方式更改单个列。谢谢。最佳答案该功能已推出here进入 v20.4。 ALTER TABLE
clickhouse - 了解 clickhouse 分区
我看到 clickhouse 为每个分区键(在每个节点中)创建了多个目录。文档说目录名称格式是:分区ID_最小块号_最大块号_级别。知道这里是什么水平吗？一个节点(一个表)上的 347 个不同的
clickhouse - Clickhouse 二级索引是否类似于 MySQL 普通索引？
我对何时使用二级索引感到困惑。我有以下代码脚本来定义 MergeTree 表，该表有十亿行。 create table t_mt( id UInt8, name String, job Stri
clickhouse - Clickhouse Buffer Table 是否适合实时摄取许多小插入？
我正在编写一个应用程序来绘制财务数据并与此类数据的实时源进行交互。由于任务的性质，可能会以一次一次交易的方式非常频繁地接收实时市场数据。我在本地使用数据库，而且我是唯一的用户。只有一个程序(我的中间件
clickhouse - 有没有办法为 clickhouse 创建 UDF？
在回答关于clickhouse的UDF的github ticket中，他们在2017年回答说不能在clickhouse中创建UDF。我想知道2020年现在有什么办法可以做到吗？最佳答案 ClickH

首页

博学

6Ren·AI

商城

clickhouse - 在 ClickHouse 中填充物化 View 超出了内存限制