python-3.x - 如何遍历索引字段以添加来自另一个索引的字段-6ren

python-3.x - 如何遍历索引字段以添加来自另一个索引的字段

转载作者：行者123 更新时间：2023-12-02 23:49:49

我对 Elasticsearch 比较陌生，所以我来这里是希望找到建议。
我有两个来自两个不同 csv 文件的弹性索引。

index_1 有这个映射:

{'settings': {
            'number_of_shards' : 3
    },
    'mappings': {
        'properties': {
            'place': {'type': 'keyword' },
            'address': {'type': 'keyword' },
        }
    }
}

该文件大约有 400 000 个文件长。
文件更小(大约 50 个文档)的 index_2 具有以下映射:

    {'settings': {
            "number_of_shards" : 1
    },
    'mappings': {
        'properties': {
            'place': {'type': 'text' },
            'address': {'type': 'keyword' },
        }
    }
}

index_2 中的字段“place”是 index_1 中字段“place”的所有唯一值。
在两个索引中，“地址”字段都是数据类型关键字的邮政编码，其结构为:0000AZ。

基于 index_1 中的“place”字段关键字，我想从 index_2 中分配字段“address”的术语。

我曾尝试使用 pandas 库，但 index_1 文件太大。我还曾尝试基于 pandas 和 elasticsearch 创建模块，但非常不成功。虽然我相信这是一个很有前途的方向。一个好的解决方案是尽可能多地保留在 elasticsearch 库中，因为这些索引稍后将用于进一步分析。

最佳答案

如果我理解正确，听起来您想使用 updateByQuery .

请求正文应如下所示:

{
   'query': {'term': {'place': "placeToMatch"}},
   'script': 'ctx._source.address = "updatedZipCode"'
}

这将使用匹配的地点更新所有文档的地址字段。

编辑:

所以我们要做的是使用 updateByQuery同时遍历 index2 中的所有文档。

第一步:从 index2 中获取所有文档，只需使用基本的 search 即可。特征

{
   "index": 'index2',
   "size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
   "body": {"query": {"match_all": {}}}
}

现在我们遍历所有结果并使用 updateByQuery对于每个结果:

// sudo
doc = response[i] 

// update by query request.
{
  index: 'index1',
  body: {
   'query': {'term': {'address': doc._source.address}},
   'script': 'ctx._source.place = "`${doc._source.place}`"'
  }
}

关于python-3.x - 如何遍历索引字段以添加来自另一个索引的字段，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58467708/