Why returning many records with no fields (only IDs) is CPU intensive?

aliostad · May 17, 2018, 10:40am

Hi,

I have been trying to use Elasticsearch for some simple filtering for example for an index with 100K records, return all matching a criteria (using as terms query).

When the number of records matching goes above 1K, CPU completely goes bonkers and search becomes very expensive. I do not return any field data, only the ID is enough for me yet it is still expensive.

I cannot use pagination since I need all such data to further process and re-order.

Doing something similar for example in SQL is trivial. I know there is no comparison but would like to understand if I am doing something completely wrong or this is essentially a limitation in Lucene-based indices.

Document size: ~ 4-20KB
Cluster: 3x beefy machines with 8 cores and 56GB RAM and striped SSDs.

Thanks
Ali

RahulD · May 17, 2018, 10:53am

What is your cluster configuration? And what is the size of each document?

aliostad · May 17, 2018, 10:54am

I am updating the question.

system · June 14, 2018, 10:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance issues when returning many ids Elasticsearch	1	268	June 29, 2021
Retrieving over a million records in Elasticsearch Elasticsearch	10	28102	July 5, 2017
Query Advice Needed Elasticsearch	6	311	July 6, 2017
Help: Is ElasticSearch the right tool for us? Elasticsearch	2	330	July 6, 2017
Performance when fetching ids for large result set Elasticsearch	3	512	July 5, 2017

Why returning many records with no fields (only IDs) is CPU intensive?

Related topics