Why returning many records with no fields (only IDs) is CPU intensive?


I have been trying to use Elasticsearch for some simple filtering for example for an index with 100K records, return all matching a criteria (using as terms query).

When the number of records matching goes above 1K, CPU completely goes bonkers and search becomes very expensive. I do not return any field data, only the ID is enough for me yet it is still expensive.

I cannot use pagination since I need all such data to further process and re-order.

Doing something similar for example in SQL is trivial. I know there is no comparison but would like to understand if I am doing something completely wrong or this is essentially a limitation in Lucene-based indices.

Document size: ~ 4-20KB
Cluster: 3x beefy machines with 8 cores and 56GB RAM and striped SSDs.


What is your cluster configuration? And what is the size of each document?

I am updating the question.

