How to retrieve all records from index

Hi
I am trying to retrieve all records from elastic search index
import elasticsearch
from elasticsearch import Elasticsearch
ret_size = 0
cluster = "dvfarmil"
es = Elasticsearch([{'host': 'YYYYYYY', 'port': 80}])
res = es.search(index='XXXXXXXXXX',
scroll = '50m',
body={"size":10000, "query": {"bool": {"must": [ {"match": {"clustername":"OOOOO"}}]}}})
sid=res['_scroll_id']
while ( ret_size > 0 ):
res = es.scroll (scroll_id=sid,scroll = '50m')
ret_size += len(res['hits']['hits'])
if ( sid == res['_scroll_id'] ):
exit(0)

else:
    sid=res['_scroll_id']

scroll_id isn't being changed and I know that the index contain ten thousands of records
please advise
Mahmood

Do you mean listing all document without pagination?
you can recognise number of document in your cluster based on hits.total right?

I want to retrieve all docs, I don't how much are they but for sure they are more than 300,000 docs
what the correct way to do it?
what I have done isn't working

I prefer to run this query { "query": { "match_all": {} }, "size": 5000, "from": 0 }
then i will doing loop for next page (by modifiying from clause) for next page.

Anyway, you can check how much the exact total docs you have in the response { "took": 94, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 852705, "max_score": 1,...}} in my case i have 852705 document

something isn't working, total is 2033951
body={"size":10000,"from":0, "query": {"bool": {"must": [ {"match": {"clustername":"test1"}}]}}}) // worked
body={"size":10000,"from":10001, "query": {"bool": {"must": [ {"match": {"clustername":"test1"}}]}}}) // failed

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.