Hi, we are doing poc to use ES in production, I am trying to learn best practice how to use it in production.
My concern point is:
When we have a heavy write environment then what is the best architecture we configure.
How to architect index and number of shards.
Configure index setting which only contains last 5 months data, and delete automatic rest of data.
My current setup is: Es 6.2.2
3master (4GB, 2CPU), 2client(16GB RAM,4CPU), 4Data(64GB RAM,16CPU), all servers half of memory is configured for heap size.
per day write data is: 20GB
First off 20GB of indexed data per day is not really very much data. Last week I setup a POC for a user, where 100GB per day is indexed and stored on a single node. On this node is Elasticsearch, Kibana and Logstash (with a very complex resource intensive pipeline). The hardware is 16 cores, 128GB RAM and data stored on 8 spinning disks. The disks are the real limiter here. They were already configured in RAID-5, which is clearly not ideal for write-biased workloads, and will be reconfigured.
My point is... your cluster is overkill for 20GB per day... in fact it is likely overkill for 100GB per day. I would recommend a more simple 3 node cluster, with these characteristics:
Ideally these nodes would use SSD storage. If multiple drives these should be configured as JBOD. Index replicas will provide the necessary redundancy.
64GB is a good starting point for RAM. More can be even better as it provides more page cache space for the OS to cache disk IO.
If you have a larger number of smaller indexes (<5GB), start with 2 shards and 1 replica. For a small number of larger indicies, 3 shards and 1 replica will better spread the load. The rule is that increasing shards increases ingestion performance, while increasing replicas improves query performance.
Basically the resources saved reducing from the 9 nodes you mention, to the 3 nodes that you really need, can be invested in the best storage and more RAM for those 3 nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.