gpt4 book ai didi

python - 在 python Elasticsearch 中滚动不起作用

转载 作者:行者123 更新时间:2023-12-03 17:09:38 24 4
gpt4 key购买 nike

当我查询 Elasticsearch 时,我尝试使用 python 滚动所有文档,以便获得超过 10K 个结果:

from elasticsearch import Elasticsearch
es = Elasticsearch(ADDRESS, port=PORT)


result = es.search(
index="INDEX",
body=es_query,
size=10000,
scroll="3m")


scroll_id = result['_scroll_id']
scroll_size = result["hits"]["total"]
counter = 0
print('total items= ' + scroll_size)

while(scroll_size > 0):
counter +=len(result['hits']['hits'])


result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']


print('found = ' +counter)
问题是有时 counter (程序结束时的结果总和)小于 result["hits"]["total"] .这是为什么?为什么 scroll不迭代所有结果?
ElasticSearch version : 5.6
lucence version :6.6

最佳答案

如果我没记错的话,您是在添加首字母 result["hits"]["total"]给您的 counterwhile 的第一次迭代中循环——但你应该只添加 的长度已检索 命中:

scroll_id = result['_scroll_id']
total = result["hits"]["total"]

print('total = %d' % total)

scroll_size = len(result["hits"]["hits"]) # this is the current 'page' size
counter = 0

while(scroll_size > 0):
counter += scroll_size

result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']
scroll_size = len(result['hits']['hits'])

print('counter = %d' % counter)
assert counter == total
事实上,你不需要单独存储滚动大小——更简洁的 while循环将是:
while len(result['hits']['hits']):
counter += len(result['hits']['hits'])

result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']

关于python - 在 python Elasticsearch 中滚动不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66315162/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com