Elasticsearch architecture and planing

Hi, we are doing poc to use ES in production, I am trying to learn best practice how to use it in production.
My concern point is:

  1. When we have a heavy write environment then what is the best architecture we configure.
  2. How to architect index and number of shards.
  3. Configure index setting which only contains last 5 months data, and delete automatic rest of data.

My current setup is: Es 6.2.2
3master (4GB, 2CPU), 2client(16GB RAM,4CPU), 4Data(64GB RAM,16CPU), all servers half of memory is configured for heap size.
per day write data is: 20GB

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

First off 20GB of indexed data per day is not really very much data. Last week I setup a POC for a user, where 100GB per day is indexed and stored on a single node. On this node is Elasticsearch, Kibana and Logstash (with a very complex resource intensive pipeline). The hardware is 16 cores, 128GB RAM and data stored on 8 spinning disks. The disks are the real limiter here. They were already configured in RAID-5, which is clearly not ideal for write-biased workloads, and will be reconfigured.

My point is... your cluster is overkill for 20GB per day... in fact it is likely overkill for 100GB per day. I would recommend a more simple 3 node cluster, with these characteristics:

  • Ideally these nodes would use SSD storage. If multiple drives these should be configured as JBOD. Index replicas will provide the necessary redundancy.

  • 64GB is a good starting point for RAM. More can be even better as it provides more page cache space for the OS to cache disk IO.

  • If you have a larger number of smaller indexes (<5GB), start with 2 shards and 1 replica. For a small number of larger indicies, 3 shards and 1 replica will better spread the load. The rule is that increasing shards increases ingestion performance, while increasing replicas improves query performance.

Basically the resources saved reducing from the 9 nodes you mention, to the 3 nodes that you really need, can be invested in the best storage and more RAM for those 3 nodes.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.