Tune Elasticsearch to process thousands of simultaneous searches

In order to optimize for high query concurrency it is important to make sure that all data can fit into the operating system page cache so that disk I/O can be minimised. That seem to be the case as each node would hold approximately 12GB of indexed data and the heap will be less than 32GB in size, but if you are continously updating the index this will affect performance and potentially impact the cache.

The second step is to use your threadpools as efficiently as possible. The best way to do this in your case would be to have a single primary shard and 8 replica shards. This assumes that the query latency of a single shard holding the full data set is sufficient. If that is not the case, split the index into as few primary shards as possible until latency is acceptable. The more primary shards you have, the fewer queries can be efficiently served concurrently.

I would recommend setting it up this way and then test a heavy query load (without updates) against it to see how it performs. You can then do the same test while updating to quantify the impact. If queries do not need access to prior updates and concurrent indexing and querying reduces performance significantly, it may be worth considering queueing up all updates, e.g. on file or in Kafka, and apply them only once all querying and processing has completed.

3 Likes