gpt4 book ai didi

django - 为什么 Elasticsearch 在 docker 中索引少量数据时性能很差?

转载 作者:行者123 更新时间:2023-12-05 03:56:17 25 4
gpt4 key购买 nike

我尝试使用 docker-compose 在 docker 中使用 elasticsearch 配置 django 应用程序。在 docker 中构建一个小索引大约需要 15 分钟。如果我在 docker 之外运行相同的命令,它会在 30 秒内执行。

这是我的 docker-compose.yml,它基于 the official docker installation guide :

version: '3'


services:

web:
build:
context: ../..
dockerfile: compose/local/Dockerfile
restart: on-failure
volumes:
- ../..:/var/www/chesno
env_file:
- ../../.env.local
depends_on:
- elasticsearch1
networks:
- esnet
- nginx_net

nginx:
image: "nginx:1.17.6-alpine"
restart: always
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d
ports:
- "5000:80"
depends_on:
- web
networks:
- nginx_net


elasticsearch1:
image: docker.elastic.co/elasticsearch/elasticsearch:5.5.3
container_name: elasticsearch
environment:
- node.name=chesno-node
- cluster.name=chesno-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata:/usr/share/elasticsearch/data
ports:
- 9201:9200
- 9301:9300
networks:
- esnet


volumes:
esdata:
driver: local


networks:
esnet:
driver: bridge
nginx_net:
driver: bridge

命令 docker-compose docker-compose.yml exec elasticsearch1 curl -XGET http://localhost:9200/_cluster/health?pretty=true 返回:

{
"cluster_name" : "chesno-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 22,
"active_shards" : 22,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 22,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}

该命令仅使用 docker 中机器的一小部分 CPU 和内存。此外,与 docker 外部项目的默认 elasticsearch 设置(只有 5 个分片)相比,它有更多的分片。

最佳答案

我不记得将近一年前我是如何解决这个问题的,但我有一些想法可能会有所帮助。该设置存在几个问题:

  1. Official instructions描述一个由三个节点组成的多节点集群。对于单节点集群,您应该指定 discovery.type=single-node。单节点集群只适用于开发环境。对于生产,我建议离开 docker 并设置一个多服务器集群 ansilbe .
  2. 最好使用最新版本的elasticsearch
  3. 碎片太多

A good practice is to ensure the amount of shards for each node stays below 20 per GB of heap that is configured.

查看 this教程以了解更多信息。

  1. 确保您有足够的硬盘空间并且不会收到错误flood stage disk watermark [95%] exceeded on

这是我当前的 elasticsearch 设置:

services:

es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
container_name: es01
environment:
- node.name=es01_local
- cluster.name=es_cluster_local
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- esnet

命令 docker-compose docker-compose.yml exec es01 curl -XGET http://localhost:9200/_cluster/health?pretty=true 返回:

{
"cluster_name" : "es_cluster_local",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 6,
"active_shards" : 6,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 6,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}

关于django - 为什么 Elasticsearch 在 docker 中索引少量数据时性能很差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59482594/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com