Elastic search response capped to 10k records

I am trying to query my index which has more then 500k records but i am only able to extract 10k records at a time. now i understand this has to do with performance of the application but how can i do bulk extract?
with records more than 50k, i don't think option of "size" and "after " will be correct way to go ahead.
current ES version we are using is 7.10.2

"version" : {
    "number" : "7.10.2",
    "build_flavor" : "oss"
}

I tried to use the concept of PIT ID for this purpose but i am not sure if this version of ES supports PIT id because when i used dev tool with PIT id concept, it didn't work.

Please help how can i do bulk get or any pagination approach for more then 500k records.

thanks.

The Point in Time API is not available in the OSS version, you need to use the Scroll API.

Check the documentation and this example on how to paginate using the scroll api.

If you are using a client, like the Python client, there is a helper called scan to help you do queries like this.

@leandrojmp thanks for reply. but as mentioned in documentation, scroll is not recommended for deep pagination for more than 10k records. In my case, i have more then 500k records which i need to extract.
I am not using any client, rather making direct http rest calls to extract data from ES but as mentioned earlier, not able to get more than 10k at a time.
below is my sample request.

curl -X POST "localhost/index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "person.name": "ABC"
    }
  },
  "fields": [
    "person.age"
  ],
  "_source": false
}
'

The documentation points out that you should use search_after together with PIT. As you are using the OSS version where this is not available, using the scroll API for deep pagination is still the recommended option.

My recommendation would however be to switch to the default distribution and upgrade at to the latest 7.17 release.

As Christian already answered, this recommendation is based in the Elastic distribution of Elasticsearch using at least the basic license.

You are using the Open Source distribution which does not have the PIT feature, so the recommendation in this case is to use the scroll API.

You can still use the scroll API without using a client, just check in the documentation on how to do it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.