Elasticsearch architecture and planing

learnhub17 · March 25, 2018, 7:13pm

Hi, we are doing poc to use ES in production, I am trying to learn best practice how to use it in production.
My concern point is:

When we have a heavy write environment then what is the best architecture we configure.
How to architect index and number of shards.
Configure index setting which only contains last 5 months data, and delete automatic rest of data.

My current setup is: Es 6.2.2
3master (4GB, 2CPU), 2client(16GB RAM,4CPU), 4Data(64GB RAM,16CPU), all servers half of memory is configured for heap size.
per day write data is: 20GB

dadoonet · March 25, 2018, 7:41pm

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

rcowart · March 26, 2018, 8:12am

First off 20GB of indexed data per day is not really very much data. Last week I setup a POC for a user, where 100GB per day is indexed and stored on a single node. On this node is Elasticsearch, Kibana and Logstash (with a very complex resource intensive pipeline). The hardware is 16 cores, 128GB RAM and data stored on 8 spinning disks. The disks are the real limiter here. They were already configured in RAID-5, which is clearly not ideal for write-biased workloads, and will be reconfigured.

My point is... your cluster is overkill for 20GB per day... in fact it is likely overkill for 100GB per day. I would recommend a more simple 3 node cluster, with these characteristics:

Ideally these nodes would use SSD storage. If multiple drives these should be configured as JBOD. Index replicas will provide the necessary redundancy.
64GB is a good starting point for RAM. More can be even better as it provides more page cache space for the OS to cache disk IO.
If you have a larger number of smaller indexes (<5GB), start with 2 shards and 1 replica. For a small number of larger indicies, 3 shards and 1 replica will better spread the load. The rule is that increasing shards increases ingestion performance, while increasing replicas improves query performance.

Basically the resources saved reducing from the 9 nodes you mention, to the 3 nodes that you really need, can be invested in the best storage and more RAM for those 3 nodes.

system · April 23, 2018, 8:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best Storage Setup for 5k docs/sec Architecture Elasticsearch	4	968	December 26, 2016
Elasticsearch 6 node sizing and configuration Elasticsearch	4	1265	December 17, 2018
ES node disk sizing Elasticsearch	3	1452	May 8, 2019
Creating a High Throughput Elasticsearch cluster Elasticsearch	1	1375	April 5, 2019
Elasticsearch sizing for Production POC Elasticsearch	1	391	January 18, 2021

Elasticsearch architecture and planing

Related topics