Get all documents from an index

Is it possible to get all the documents from an index?
I tried it with python and requests but always get
query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

I have no idea how the scroll api works and the documentation isn't helpful for me either.
Could someone please help me.

Hi,
You said you tried with python, so I'll just show you what worked for me :

es = Elasticsearch(['http://yourElasticIP:9200/'])
doc = {
        'size' : 10000,
        'query': {
            'match_all' : {}
       }
   }
res = es.search(index='indexname', doc_type='typename', body=doc,scroll='1m')

Then you get a reponse with your matching documents and also an attribute named '_scroll_id'

So you can do

scrollId = res['_scroll_id']
es.scroll(scroll_id = scrollId, scroll = '1m')

Where res is the result of your previous es search.
You can do the es.scroll as many times as you need, just remember to update the scrollId value each time you do a new request
Sorry if I wasn't very clear

2 Likes

Thank you.
Size indicates how many hits i get?

Yes, but as you saw, it can't be over 10 000, so you have to use the scroll API, don't think you have another choice

Ok, so I will get the first 10 000 results. How do I get the rest?
Sorry for my stupid asking, but I am missing the forest through the trees right now.

If you look at the code above, the es.scroll function allows you to get results past 10 000.

es = Elasticsearch(['http://x.x.x.x:9200/'])
doc = {
    'size' : 10000,
    'query': {
        'match_all' : {}
    }
}

res = es.search(index="myIndex", doc_type='myType', body=doc,scroll='1m')
scroll = res['_scroll_id']
res2 = es.scroll(scroll_id = scroll, scroll = '1m')

In this example, you have your first 10 000 hits in res, and the next 10 000 in res2. If you want results from 20 000 to 30 000, you just get the new scroll id value from res 2!

AHHHH... forest, there it is.
Thank you for your help. It finally made click.

No problem, have a nice day!

on one of my indexes I get no data just
{'timed_out': False, 'hits': {'total': 1843, 'max_score': 1.0, 'hits': []}, '_shards': {'successful': 5, 'total': 5, 'failed': 0}, 'terminated_early': False, '_scroll_id': 'DnF1ZXJ5VGhlbkZldGNoBQAAAAAAARfQFm5rMUVCeUxTVDJHUm5qZ2dBQkpJMncAAAAAAAExGBZyNFIxMV93QVRqT0wtTTNoZ1dUenN3AAAAAAABF88WbmsxRUJ5TFNUMkdSbmpnZ0FCSkkydwAAAAAAAPrrFnpFTW9aaHRPUzd1X0Y0UHRORTFpSFEAAAAAAAExFxZyNFIxMV93QVRqT0wtTTNoZ1dUenN3', 'took': 2}
Any idea on that?

1 Like

Sorry, not sure why that is.
Only thing that comes to mind is that size was set to 0, other than that I don't know

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.