So I have 4 nodes in my ELK stack. I have a logstash server that parses syslog messages into my cluster. I have 3 ES nodes. One is strictly a master node. The other two are master/data eligible.
So it's all working fine UNTIL I reach a 'magic point' in my data. At some point, the cluster goes red and I am not able to do anything to fix it. This seems to happen after the data nodes end up with a large amount of indexes. What I end up doing after snapshots is flushing all indexes that are 30 days and older, and the cluster goes green and the whole ELK stack starts functioning normally again.
I have the default shards (5) and 1 replica.
My question is this. I need to keep 90 days of active data plus 1 year of logs (Think PCI DSS). I have curator taking a snapshot of the data. My problem is that I can't get to 90 days with the raw amount of data.
Is this a sign of needing more shards? Do I need more data nodes?