Retrieving over a million records in Elasticsearch

nik9000 · February 10, 2016, 2:03am

At this point you are better of making 3 of them master eligible and data nodes and the other 3 just data nodes.

Or the scoring. You should see if it gets faster if you sort by _doc.

It could also be fetching the _ids.

You should use the hot_threads API to see what is taking the time.

The bitsets aren't of _ids. They are at the Lucene segment level and _id is a thing Elasticsearch is inserting on top of that. Depending on your query it may not even use the cache - if it needs scores it won't. If it is super fast without the cache (term query) then it'll skip it as well.

What do you want to do with the results? Elasticsearch's aggregates were built to do interesting things with portions of the documents after apply arbitrary filters. You might have a similar problem. I mean, maybe its one that can be solved with an aggregation. Or maybe it is one that we just need to better understand.

Topic		Replies	Views
How to improve Scroll runtime for 5 billion record retrieval? Elasticsearch	3	403	May 11, 2020
Performance impact of returning large result sets Elasticsearch	3	4301	July 5, 2017
How to fetch ~12M documents(may be even more) quickly from ES using scroll API? Elasticsearch	4	831	December 28, 2017
Scan/Scroll performance and cache Elasticsearch	11	3481	July 5, 2017
Slow query for large size values Elasticsearch	6	1502	July 31, 2019

Retrieving over a million records in Elasticsearch

Related topics