Elastic search response capped to 10k records

Deepanshu_Rai · September 12, 2022, 12:05pm

I am trying to query my index which has more then 500k records but i am only able to extract 10k records at a time. now i understand this has to do with performance of the application but how can i do bulk extract?
with records more than 50k, i don't think option of "size" and "after " will be correct way to go ahead.
current ES version we are using is 7.10.2

"version" : {
    "number" : "7.10.2",
    "build_flavor" : "oss"
}

I tried to use the concept of PIT ID for this purpose but i am not sure if this version of ES supports PIT id because when i used dev tool with PIT id concept, it didn't work.

Please help how can i do bulk get or any pagination approach for more then 500k records.

thanks.

leandrojmp · September 12, 2022, 12:41pm

The Point in Time API is not available in the OSS version, you need to use the Scroll API.

Check the documentation and this example on how to paginate using the scroll api.

If you are using a client, like the Python client, there is a helper called scan to help you do queries like this.

Deepanshu_Rai · September 14, 2022, 6:54am

@leandrojmp thanks for reply. but as mentioned in documentation, scroll is not recommended for deep pagination for more than 10k records. In my case, i have more then 500k records which i need to extract.
I am not using any client, rather making direct http rest calls to extract data from ES but as mentioned earlier, not able to get more than 10k at a time.
below is my sample request.

curl -X POST "localhost/index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "person.name": "ABC"
    }
  },
  "fields": [
    "person.age"
  ],
  "_source": false
}
'

Christian_Dahlqvist · September 14, 2022, 7:38am

The documentation points out that you should use search_after together with PIT. As you are using the OSS version where this is not available, using the scroll API for deep pagination is still the recommended option.

My recommendation would however be to switch to the default distribution and upgrade at to the latest 7.17 release.

leandrojmp · September 14, 2022, 12:07pm

As Christian already answered, this recommendation is based in the Elastic distribution of Elasticsearch using at least the basic license.

You are using the Open Source distribution which does not have the PIT feature, so the recommendation in this case is to use the scroll API.

You can still use the scroll API without using a client, just check in the documentation on how to do it.

system · October 12, 2022, 12:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to read indices with > 10k records? Elasticsearch	3	448	April 30, 2018
Unable to retrieve more 10k records from elastic search using rest api Elasticsearch	4	572	October 22, 2021
How to script export of > 10,000 records - 5 mil? Elasticsearch	9	6437	July 5, 2017
How to retrieve more than 10,000 results Elasticsearch	7	3112	August 16, 2018
Export all records from 12M index Elasticsearch	3	442	May 4, 2021

Elastic search response capped to 10k records

Related topics