Hello,
Nowadays we're currently building our cluster and play with the different components for best performance.
- Current Cluster Nodes:
3 data nodes, each with 16 GB RAM, 6 CPU cores.
3 master nodes, each with 4 GB RAM, 4 CPU cores. - Data:
At the beginning we wanted to just make a big single index - but understood quickly it's bad for performance and maintenance. We found a way to make it time based, so we have a weekly index of about 100 GB. It's important to notice that our search granularity is usually 1 month - meaning that the search request involves 4 indices in practice.
Refresh interval: 30 seconds. - Throughput:
We have a low write throughput (about 50-100 documents per second) written in bulks. We don't really have many search requests per second. Let's say there are 10 users which make 6 concurrent searches once and a while. The search requests are 6 serious aggregation queries which runs in parallel to ElasticSearch. Each query consists of a bool query and a chained nested aggregation, filter aggregation and a terms aggregation sorted by reverse nested aggregation. Right now we don't use multi search API.
This is the best practice rules we know:
- Shard size should be lower than 50 GB.
- In 1 index: #Shards = #Nodes * #Cores in each node.
By the way, the #Shards includes the replica shards or only primary shards?
We obviously know we need to strengthen the cluster.
We would like to hear about any general recommendation.