I can't retrieve all data from index

Hello,
I need to retrieve all the data matching my query, they returned data exceed the default size for window which is 10000,
How can i do it ?

GET logstash-2017.08.04/_search?scroll=5m
{
  "query": { 
  
     "match": {"type": "xms_api_station_info"
        }
   },
   "size": 100000
}

Thanks,

Have a look at the scroll API. This is specifically designed for retrieving large amounts of data efficiently.

I used it but it didn't work

{
"error": {
"root_cause": [
{
"type": "query_phase_execution_exception",
"reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "logstash-2017.08.04",
"node": "K3CfiisZQpGhffoxkDxR_A",
"reason": {
"type": "query_phase_execution_exception",
"reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
}
}
]
},
"status": 500
}

You will need to make multiple requests when using the scroll API.

What you mean by multiple requests ?

Did you read the page I linked to?

Yes, I read it. but i didn't find information about multiple requests

@Christian_Dahlqvist
Did you mean multiple requests by doing it in that way ?

GET /_search/scroll
{
"scroll": "5m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAHvyMFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jhZLM0NmaWlzWlFwR2hmZm94a0R4Ul9BAAAAAAAe_IsWSzNDZmlpc1pRcEdoZmZveGtEeFJfQQAAAAAAHvyNFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jxZLM0NmaWlzWlFwR2hmZm94a0R4Ul9B"
}

GET /_search/scroll
{
"scroll": "5m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAHvyMFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jhZLM0NmaWlzWlFwR2hmZm94a0R4Ul9BAAAAAAAe_IsWSzNDZmlpc1pRcEdoZmZveGtEeFJfQQAAAAAAHvyNFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jxZLM0NmaWlzWlFwR2hmZm94a0R4Ul9B"
}

GET /_search/scroll
{
"scroll": "5m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAHvyMFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jhZLM0NmaWlzWlFwR2hmZm94a0R4Ul9BAAAAAAAe_IsWSzNDZmlpc1pRcEdoZmZveGtEeFJfQQAAAAAAHvyNFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jxZLM0NmaWlzWlFwR2hmZm94a0R4Ul9B"
}

GET /_search/scroll
{
"scroll": "5m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAHvyMFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jhZLM0NmaWlzWlFwR2hmZm94a0R4Ul9BAAAAAAAe_IsWSzNDZmlpc1pRcEdoZmZveGtEeFJfQQAAAAAAHvyNFkszQ2ZpaXNaUXBHaGZmb3hrRHhSX0EAAAAAAB78jxZLM0NmaWlzWlFwR2hmZm94a0R4Ul9B"
}

Each batch should return a new scroll_id to be used in the next request. It looks like your example has multiple requests with the same scroll_id.

Every request return the same scroll_id.
Is that wrong ?

Can you show the two first requests you send with their respective responses? If it is too large, store the data externally, e.g. in a gist, and link to it here.

The returned data aren't the same. I have looked at them if you mean that.

So it works for you now?

1 Like

Yes, it works for me.
Thanks a lot for your support, Christian

FWIW, I wrote a tool to extract all data from an index that handles the scroll stuff for you. May not be what you're after but for batch data extraction check out https://github.com/berglh/escroll

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.