On average, what should we expect in query performance for nodes and core variations?

  1. Doubling the number of cores in rough average means reducing in half the query time ? (from 6 to 12 in my case)

  2. Doubling the number of nodes in rough average means reducing in half the query time ? (from 1 to 2 and from 2 to 3 in my case)

  3. If the double the number of nodes and increase the number of replicas in general would I get a penalty in query performance because of the replica ?

It is hard to say as it depends on what is currently limiting performance. The number of concurrent queries and shards queried also matters.

If all data is cached in the OS page cache you might be CPU limited. If so, you need to make sure you have enough shards to make use of the parallel processing capacity.

If you instead are limited by disk performance adding CPU resources may not help at all.

I get it.

Concurrent queries are not my case I have large data dataset with low number of queries

Is there an easy way to know if the bottleneck is dick IO or CPU or memory ?

Run queries and monitor GC, CPU usage and disk utilization/iowait. That should give you a good indication. If you have large amounts of data, much more than will fit in the OS page cache, disk I/O is often the bottleneck so that is what I would check first.

I will measure it and get back

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.