I can't retrieve all data from index

moabbas · August 15, 2017, 8:03am

Hello,
I need to retrieve all the data matching my query, they returned data exceed the default size for window which is 10000,
How can i do it ?

GET logstash-2017.08.04/_search?scroll=5m
{
  "query": { 
  
     "match": {"type": "xms_api_station_info"
        }
   },
   "size": 100000
}

Thanks,

Christian_Dahlqvist · August 15, 2017, 8:09am

Have a look at the scroll API. This is specifically designed for retrieving large amounts of data efficiently.

moabbas · August 15, 2017, 8:14am

I used it but it didn't work

{
"error": {
"root_cause": [
{
"type": "query_phase_execution_exception",
"reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "logstash-2017.08.04",
"node": "K3CfiisZQpGhffoxkDxR_A",
"reason": {
"type": "query_phase_execution_exception",
"reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
}
}
]
},
"status": 500
}

Christian_Dahlqvist · August 15, 2017, 8:19am

You will need to make multiple requests when using the scroll API.

moabbas · August 15, 2017, 8:20am

What you mean by multiple requests ?

Christian_Dahlqvist · August 15, 2017, 8:21am

Did you read the page I linked to?

moabbas · August 15, 2017, 8:26am

Yes, I read it. but i didn't find information about multiple requests

moabbas · August 15, 2017, 8:33am

@Christian_Dahlqvist
Did you mean multiple requests by doing it in that way ?

GET /_search/scroll
{
"scroll": "5m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAHvyMFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jhZLM0NmaWlzWlFwR2hmZm94a0R4Ul9BAAAAAAAe_IsWSzNDZmlpc1pRcEdoZmZveGtEeFJfQQAAAAAAHvyNFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jxZLM0NmaWlzWlFwR2hmZm94a0R4Ul9B"
}

Christian_Dahlqvist · August 15, 2017, 9:08am

Each batch should return a new scroll_id to be used in the next request. It looks like your example has multiple requests with the same scroll_id.

moabbas · August 15, 2017, 9:16am

Every request return the same scroll_id.
Is that wrong ?

Christian_Dahlqvist · August 15, 2017, 9:19am

Can you show the two first requests you send with their respective responses? If it is too large, store the data externally, e.g. in a gist, and link to it here.

moabbas · August 15, 2017, 9:30am

The returned data aren't the same. I have looked at them if you mean that.

Christian_Dahlqvist · August 15, 2017, 9:34am

So it works for you now?

moabbas · August 15, 2017, 9:35am

Yes, it works for me.
Thanks a lot for your support, Christian

berg · August 21, 2017, 10:57pm

FWIW, I wrote a tool to extract all data from an index that handles the scroll stuff for you. May not be what you're after but for batch data extraction check out https://github.com/berglh/escroll

system · September 18, 2017, 10:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
One shot search query by scroll from index > 10'000 Elasticsearch	7	1934	April 21, 2020
Max_result_window of all the results saved Elasticsearch	1	217	December 20, 2023
Result window is too large Elasticsearch	2	9865	December 26, 2022
Max size for getting data Elasticsearch	8	2701	February 28, 2018
Batch size is too large, size must be less than or equal to: [10000] but was [10001]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting Elasticsearch	2	3041	June 6, 2017

I can't retrieve all data from index

Related topics