gpt4 book ai didi

elasticsearch - 跨多个 Elasticsearch 类型查询

转载 作者:行者123 更新时间:2023-12-01 04:23:25 25 4
gpt4 key购买 nike

我想在 Elastic Search 5.0 中获取以多种类型(type1 AND type2 AND type3...)存在的文档。我知道可以通过在 URL 中使用多种类型(如 type1、type2)并过滤 _type 字段来跨多种类型进行搜索。但是所有这些条件都是 OR (type1 OR type2)。如何实现 AND 条件?

这是我的ES中的两个文档,

{
"_index":"cust_58e8700034fa4e368590fb1396e2641c",
"_type":"unique-fp-domains",
"_id":"n_d4dbba7309a94503b25eca735078f17c_258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
"_version":2,
"_score":1,
"_source":{
"mg_timestamp":1579866709096,
"violated-directive":"connect-src",
"fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
"time":1579866709096,
"scan-id":"n_d4dbba7309a94503b25eca735078f17c",
"blocked-uri":"play.sundaysky.com"
}
}


{
"_index":"cust_58e8700034fa4e368590fb1396e2641c",
"_type":"tag-alexa-top1k-using-csp-tld-domain",
"_id":"AW_XY4P4kmprPQ28bTUb",
"_version":1,
"_score":1,
"_source":{
"tagged-domain":"sundaysky.com",
"tag-guidance":"FP",
"additional-tag-metadata-isbase64-encoded":"eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
"project-id":2,
"fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
"scan-id":"n_d4dbba7309a94503b25eca735078f17c",
}
}

我想使用 "scan-id":"n_d4dbba7309a94503b25eca735078f17c"

从给定的 2 种类型中获取相同索引的文档

我试过了,

{
"size": 100,
"query": {
"bool": {
"must": [
{
"bool": {
"filter": [
{
"term": {
"_type": {
"value": "tag-alexa-top1k-using-csp-tld-domain"
}
}
},
{
"term": {
"scan-id": {
"value": "n_d4dbba7309a94503b25eca735078f17c"
}
}
}
]
}
},
{
"bool": {
"filter": [
{
"term": {
"_type": {
"value": "unique-fp-domains"
}
}
},
{
"term": {
"scan-id": {
"value": "n_d4dbba7309a94503b25eca735078f17c"
}
}
}
]
}
}
]
}
}
}

但它不起作用。

最佳答案

Elasticsearch 不擅长加入不同的文档集合,但在您的情况下,您可以使用 parent-child 解决您的问题。关系。

如何以AND方式同时查询多种索引类型?

如果你有一个 one-to-many relationship你可以用 parent-child 来建模.让我们假设类型 unique-fp-domains是“父”类型和scan-id字段是唯一标识符。我们还假设 tag-alexa-top1k-using-csp-tld-domain是一个“ child ”,每个类型的文档 tag-alexa-top1k-using-csp-tld-domainunique-fp-domains 中恰好引用 1 个文档.

然后我们应该按以下方式创建 Elasticsearch 映射:

PUT /cust_58
{
"mappings": {
"unique-fp-domains": {},
"tag-alexa-top1k-using-csp-tld-domain": {
"_parent": {
"type": "unique-fp-domains"
}
}
}
}

然后像这样插入文档:

# "parent"
PUT /cust_58/unique-fp-domains/n_d4dbba7309a94503b25eca735078f17c
{
"mg_timestamp": 1579866709096,
"violated-directive": "connect-src",
"fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
"time": 1579866709096,
"scan-id": "n_d4dbba7309a94503b25eca735078f17c",
"blocked-uri": "play.sundaysky.com"
}

# "child"
POST /cust_58/tag-alexa-top1k-using-csp-tld-domain?parent=n_d4dbba7309a94503b25eca735078f17c
{
"tagged-domain": "sundaysky.com",
"tag-guidance": "FP",
"additional-tag-metadata-isbase64-encoded": "eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
"project-id": 2,
"fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
"scan-id": "n_d4dbba7309a94503b25eca735078f17c"
}

现在我们将能够查询具有任何与其关联的子对象的父对象 == 加入父 ID,这是我们强制为 scan-id通过提供 _id手动编辑文档。

查询将使用 has_child 看起来像这样:

POST /cust_58/unique-fp-domains/_search
{
"query": {
"has_child": {
"type": "tag-alexa-top1k-using-csp-tld-domain",
"query": {
"match_all": {}
},
"inner_hits": {}
}
}
}

请注意,我们使用 inner_hits 告诉 Elasticsearch 检索匹配的“子”文档。

输出看起来像:

    "hits": [
{
"_index": "cust_58",
"_type": "unique-fp-domains",
"_id": "n_d4dbba7309a94503b25eca735078f17c",
"_score": 1.0,
"_source": {
"mg_timestamp": 1579866709096,
"violated-directive": "connect-src",
...
},
"inner_hits": {
"tag-alexa-top1k-using-csp-tld-domain": {
"hits": {
"total": 1,
"max_score": 1.0,
"hits": [
{
"_type": "tag-alexa-top1k-using-csp-tld-domain",
"_id": "AW_xhfnnIzWDkoWd1czA",
"_score": 1.0,
"_routing": "n_d4dbba7309a94503b25eca735078f17c",
"_parent": "n_d4dbba7309a94503b25eca735078f17c",
"_source": {
"tagged-domain": "sundaysky.com",
...
}

使用 parent-child 有什么缺点? ?

  • 父 ID 应该是唯一的
  • 仅在父 ID 上加入
  • 一些performance overhead :

    If you care about query performance you should not use this query.

  • 要启用父子关系,必须更改映射并重新索引现有数据

其他需要考虑的重要事项

在 Elasticsearch 6 中,输入 have been removed .好消息是,从 Elasticsearch 5 开始就可以使用 join datatype .

总的来说,Elasticsearch 不太擅长管理对象之间的关系,但是there are few ways to deal with them .

希望对您有所帮助!

关于elasticsearch - 跨多个 Elasticsearch 类型查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59929996/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com