Few Queries regarding Producion Cluster Configuration


I would like to get expert views on production configuration of ElasticSearch.

We have Elasticsearch (Version 1.5.0) cluster of 15 machines.

5 Machines are master eligible as well as data nodes.
10 machines are only data nodes.
60 Indexes, 900 Shards, 3 Billion Documents, Total Primary Size 1.7 TB

Each node configuration is 30 GB RAM & 8 vCPU & 1 TB HDD. We have configured total 230 GB Heap Memory for ElasticSearch cluster.

We index 200 Millions documents on daily basis. Document size is around 1-2 KB. Our search request rate is 200 requests per second.

Replicate factor is 2.

I have few questions:-

  1. Should we configure few machines as MASTER only? How will it impact performance of cluster?
  2. What is recommended RAM and CPU per node? Can we reduce the number of CPU per node without compromising with cluster performance?
  3. Should we increase replication factor from 2?

We would definitely upgrade ES from 1.5 to 5.x version so apart from this if anyone have any suggestions then please share your thoughts.

Thanks in Advance!!

If you can, then splitting them out is best practise. It helps cluster stability.

You need to test that yourself based on your use case.

Is the data business critical? Is your infrastructure unstable?

Hi Mark,

Thanks for response!

What would be recommendation regarding number of MASTER only machines in cluster of 15 machines?

Currently if any heavy query using aggregations is run on cluster then machines goes to high CPU usage due to which few machines get disconnected from cluster. We were facing same issue when our cluster size was of 8 machines, last month we scaled it up to 15 machines but it didn't help much. I am worrying that we have high configuration 15 machines cluster now but still aggregation queries are breaking the cluster.

Regarding replication factor, data is not critical but can it increase the query performance by increasing replication factor?

3, it's always 3 :slight_smile:
You can add more, as long as it's uneven, but you only need the majority of whatever you have. And having 9 masters, for a majority of 5, doesn't really give you more.

Depends what the aggs are and why things break. You are on 1.5 and there's been heaps of changes since then to stop this sort of pattern.

It can, but there are other ways that don't need N+1 amounts of disk to do that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.