Get all documents from an index

McElroy · May 24, 2017, 12:48pm

Is it possible to get all the documents from an index?
I tried it with python and requests but always get
query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

I have no idea how the scroll api works and the documentation isn't helpful for me either.
Could someone please help me.

Heinmci · May 24, 2017, 1:01pm

Hi,
You said you tried with python, so I'll just show you what worked for me :

es = Elasticsearch(['http://yourElasticIP:9200/'])
doc = {
        'size' : 10000,
        'query': {
            'match_all' : {}
       }
   }
res = es.search(index='indexname', doc_type='typename', body=doc,scroll='1m')

Then you get a reponse with your matching documents and also an attribute named '_scroll_id'

So you can do

scrollId = res['_scroll_id']
es.scroll(scroll_id = scrollId, scroll = '1m')

Where res is the result of your previous es search.
You can do the es.scroll as many times as you need, just remember to update the scrollId value each time you do a new request
Sorry if I wasn't very clear

McElroy · May 24, 2017, 1:03pm

Thank you.
Size indicates how many hits i get?

Heinmci · May 24, 2017, 1:03pm

Yes, but as you saw, it can't be over 10 000, so you have to use the scroll API, don't think you have another choice

McElroy · May 24, 2017, 1:06pm

Ok, so I will get the first 10 000 results. How do I get the rest?
Sorry for my stupid asking, but I am missing the forest through the trees right now.

Heinmci · May 24, 2017, 1:16pm

If you look at the code above, the es.scroll function allows you to get results past 10 000.

es = Elasticsearch(['http://x.x.x.x:9200/'])
doc = {
    'size' : 10000,
    'query': {
        'match_all' : {}
    }
}

res = es.search(index="myIndex", doc_type='myType', body=doc,scroll='1m')
scroll = res['_scroll_id']
res2 = es.scroll(scroll_id = scroll, scroll = '1m')

In this example, you have your first 10 000 hits in res, and the next 10 000 in res2. If you want results from 20 000 to 30 000, you just get the new scroll id value from res 2!

McElroy · May 24, 2017, 1:18pm

AHHHH... forest, there it is.
Thank you for your help. It finally made click.

Heinmci · May 24, 2017, 1:19pm

No problem, have a nice day!

McElroy · May 24, 2017, 1:43pm

on one of my indexes I get no data just
{'timed_out': False, 'hits': {'total': 1843, 'max_score': 1.0, 'hits': []}, '_shards': {'successful': 5, 'total': 5, 'failed': 0}, 'terminated_early': False, '_scroll_id': 'DnF1ZXJ5VGhlbkZldGNoBQAAAAAAARfQFm5rMUVCeUxTVDJHUm5qZ2dBQkpJMncAAAAAAAExGBZyNFIxMV93QVRqT0wtTTNoZ1dUenN3AAAAAAABF88WbmsxRUJ5TFNUMkdSbmpnZ0FCSkkydwAAAAAAAPrrFnpFTW9aaHRPUzd1X0Y0UHRORTFpSFEAAAAAAAExFxZyNFIxMV93QVRqT0wtTTNoZ1dUenN3', 'took': 2}
Any idea on that?

Heinmci · May 24, 2017, 1:49pm

Sorry, not sure why that is.
Only thing that comes to mind is that size was set to 0, other than that I don't know

system · June 21, 2017, 1:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to download data from an elasticsearch index Elasticsearch	13	28490	May 14, 2021
How to retrieve all records from index Elasticsearch	5	3392	December 13, 2018
Getting documents from an index with more than 10,000 records Elasticsearch	5	5119	March 12, 2021
How can i speed up getting all document in an index Elasticsearch	2	2091	July 10, 2020
How to get all documents in Elasticsearch? Elasticsearch	4	970	March 8, 2018

Get all documents from an index

Related topics