How to retrieve all records from index

(Mahmood Majadly) #1

I am trying to retrieve all records from elastic search index
import elasticsearch
from elasticsearch import Elasticsearch
ret_size = 0
cluster = "dvfarmil"
es = Elasticsearch([{'host': 'YYYYYYY', 'port': 80}])
scroll = '50m',
body={"size":10000, "query": {"bool": {"must": [ {"match": {"clustername":"OOOOO"}}]}}})
while ( ret_size > 0 ):
res = es.scroll (scroll_id=sid,scroll = '50m')
ret_size += len(res['hits']['hits'])
if ( sid == res['_scroll_id'] ):


scroll_id isn't being changed and I know that the index contain ten thousands of records
please advise

(Ferry Ardhana) #2

Do you mean listing all document without pagination?
you can recognise number of document in your cluster based on right?

(Mahmood Majadly) #3

I want to retrieve all docs, I don't how much are they but for sure they are more than 300,000 docs
what the correct way to do it?
what I have done isn't working

(Ferry Ardhana) #4

I prefer to run this query { "query": { "match_all": {} }, "size": 5000, "from": 0 }
then i will doing loop for next page (by modifiying from clause) for next page.

Anyway, you can check how much the exact total docs you have in the response { "took": 94, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 852705, "max_score": 1,...}} in my case i have 852705 document

(Mahmood Majadly) #5

something isn't working, total is 2033951
body={"size":10000,"from":0, "query": {"bool": {"must": [ {"match": {"clustername":"test1"}}]}}}) // worked
body={"size":10000,"from":10001, "query": {"bool": {"must": [ {"match": {"clustername":"test1"}}]}}}) // failed

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.