I am pretty new to Elastic search and going thru lot of documents to understand the internals so as to optimize our cluster.
Our 28 Data node and 5 Master node Elastic search (6.1) cluster is facing very frequent Yellow and Red status and lots of un assigned shards. Below are more details of cluster:
Data Node - 28 (124 GB - Ram, 8 core, 55Gb - Heap)
Master Node - 5
Total Indices - 1500
Total Shards - 7000
Total Docs - 6,920,711,994
Total Data Size - 6 TB
Almost 10% of Indices have 20 Shards with 2 replicas and others have 2 shards with 2 replicas.
Those 10% of Indices with 20 Shards are of 32Gb data size and other indices are below 1Gb.
We Index on an avg 50Gb of data everyday and discard same amount of data each day. We have plenty of aggregate queries being executed, with almost 3-4 requests per min.
We are facing very frequent un-assinged shards which is taking couple of hours to recover. We also observed very high heap usage and at several occurences getting OOM error.
Is there anything very wrong with our cluster setup?
Thanks & Regards