gpt4 book ai didi

arrays - 从elasticsearch查询中获取指定的数组元素个数

转载 作者:行者123 更新时间:2023-12-02 22:40:40 28 4
gpt4 key购买 nike

我有一个关于Elasticsearch的索引,该索引的记录中有一个数组。
假设字段名称为“样本”,数组为:

["abc","xyz","mnp".....]



那么是否有任何查询,以便我可以指定要从数组中检索的元素编号。
说我希望检索到的记录在样本数组中应该只包含前2个元素

最佳答案

假设您将字符串数组作为文档。我有两个想法可能会对您有所帮助。

PUT /arrayindex/
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"spacelyzer": {
"tokenizer": "whitespace"
},
"commalyzer": {
"type": "custom",
"tokenizer": "commatokenizer",
"char_filter": "square_bracket"
}
},
"tokenizer": {
"commatokenizer": {
"type": "pattern",
"pattern": ","
}
},
"char_filter": {
"square_bracket": {
"type": "mapping",
"mappings": [
"[=>",
"]=>"
]
}
}
}
}
},
"mappings": {
"array_set": {
"properties": {
"array_space": {
"analyzer": "spacelyzer",
"type": "string"
},
"array_comma": {
"analyzer": "commalyzer",
"type": "string"
}
}
}
}
}

POST /arrayindex/array_set/1
{
"array_space": "qwer qweee trrww ooenriwu njj"
}

POST /arrayindex/array_set/2
{
"array_comma": "[qwer,qweee,trrww,ooenriwu,njj]"
}

上面的DSL接受两种类型的数组,一种是用空格分隔的字符串,其中每个字符串代表一个数组的元素,另一种是您指定的一种数组。这是数组,在Python中是可能的,在python中,如果您为这样的文档建立索引,它将自动转换为字符串,即 ["abc","xyz","mnp".....]将转换为 "["abc","xyz","mnp".....]"
spacelyzer根据空格标记化, commalyzer根据逗号标记化,并从字符串中删除 [ and ]

现在,如果您使用如下的Termvector API:
GET arrayindex/array_set/1/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}

GET arrayindex/array_set/2/_termvector
{
"fields" : ["array_space", "array_comma"],
"term_statistics" : true,
"field_statistics" : true
}

您可以简单地从他们的响应中获取元素的位置,例如查找 "njj"使用的位置
  • termvector_response["term_vectors"]["array_comma"]["terms"]["njj"]["tokens"][0]["position"]
  • termvector_response["term_vectors"]["array_space"]["terms"]["njj"]["tokens"][0]["position"]

  • 两者都会给你 4,它是指定数组中的实际索引。我建议您使用 whitespace类型设计。

    用于此的Python代码可以是:
    from elasticsearch import Elasticsearch

    ES_HOST = {"host" : "localhost", "port" : 9200}
    ES_CLIENT = Elasticsearch(hosts = [ES_HOST], timeout = 180)

    def getTermVector(doc_id):
    a = ES_CLIENT.termvector\
    (index = "arrayindex",
    doc_type = "array_set",
    id = doc_id,
    field_statistics = True,
    fields = ['array_space', 'array_comma'],
    term_statistics = True)
    return a

    def getElements(num, array_no):
    all_terms = getTermVector(array_no)['term_vectors']['array_space']['terms']
    for i in range(num):
    for term in all_terms:
    for jsons in all_terms[term]['tokens']:
    if jsons['position'] == i:
    print term, "@ index", i


    getElements(3, 1)

    # qwer @ index 0
    # qweee @ index 1
    # trrww @ index 2

    关于arrays - 从elasticsearch查询中获取指定的数组元素个数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32076235/

    28 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com