gpt4 book ai didi

performance - ElasticSearch 速度问题

转载 作者:行者123 更新时间:2023-12-04 03:27:21 27 4
gpt4 key购买 nike

我们使用 ElasticSearch 处理 1500 万条记录。记录分为不同的索引大小,其中一些索引有 150 万条记录。

我们有足够的 80 GB RAM,60 GB 的整个索引都适合 RAM。作为来自 ElasticSearch 的响应时间,我们有统计数据表明,查询执行花费了 7 毫秒,但我们在 300 毫秒内从 ElasticSearch 获得了结果。这里有什么问题?我们可以在哪里搜索,我们的时间去哪里了?

ES 设置:

2 Nodes on 2 different hosts

Each index has 1 primary shard we have 2 shards each index

3,762 Total Shards

3,762 Successful Shards

85 Indices

20,347,989 Documents

40.5GB Size

enter image description here

Elasticsearch .yml

index.cache.field.type: soft

indices.cache.filter.size: 50%

index.fielddata.cache: soft

index.cache.field.expire: 60m

indices.fielddata.cache.size: 50%

indices.fielddata.cache.expire : 60m

index.store.type: mmapfs

transport.tcp.compress: true;

bootstrap.mlockall: true

index.search.slowlog.threshold.query.warn: 10s

index.search.slowlog.threshold.query.info: 5s

index.search.slowlog.threshold.query.debug: 2s

index.search.slowlog.threshold.query.trace: 500ms

示例:我们有国家 DE 的索引,并且有 1,5M 文档。该索引有 2 个分片。

ES的启动:

/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms32g -Xmx32g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.2.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch

操作系统:

24 Cores

80 GB of RAM

60 GB are used

Disk space: 1,2 TB

350 GB used / 780GB free

Disc type: SAS

Mysql is running also on this machine

示例查询:搜索某个城市,我们将 location_id 提供给 ES:

{
"query": {
"match_all": {}
},
"sort": {},
"facets": {
"location_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}]
}
},
"terms": {
"field": "location_facet",
"all_terms": true,
"size": 100,
"script": "doc['geo_point'].empty ? null : ceil(doc['geo_point'].arcDistanceInKm(-33.42628, -70.56656)) + '|' + doc['location_facet'].value\n + '|' + doc['location_id'].value"
}
},
"company_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}, {
"terms": {
"location_id": [
25717
]
}
}]
}
},
"terms": {
"field": "company_facet",
"order": "count",
"script": "doc['company_facet'].value + '|' + doc['company_id'].value"
}
},
"job_type_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}]
}
},
"terms": {
"field": "jobtype_facet",
"order": "term",
"all_terms": true
}
}
},
"filter": {},
"size": 10,
"from": 0,
"explain": false,
"highlight": {
"order": "score",
"require_field_match": false,
"pre_tags": [
"<b>"
],
"post_tags": [
"</b>"
],
"fields": {
"description": {
"type": "fvh",
"force_source": true,
"no_match_size": 200,
"index_options": "offsets",
"fragment_size": 200,
"number_of_fragments": 2,
"matched_fields": [
"description",
"title"
]
}
}
}
}

此查询的响应时间:> 400 毫秒,非常慢。我们也禁用了分面,但没有任何改变。

最佳答案

对于单个点,“geo_bounding_box”过滤器可能比“geo_distance”过滤器更快。

关于performance - ElasticSearch 速度问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24707645/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com