elasticsearch / logstash / kibana oss 6.2.4
23GB log/day. Initially the cluster was ok to ingest data, but slow on building large index.
But after ingest 2TB of index data, it slows down more. Eventually cluster status turned to red.
How can cluster recover from this and how to prevent this happening?
Thanks.
{"log":"[2018-10-09T22:13:30,824][WARN ][o.e.c.s.MasterService ] [7BNppEw] cluster state update task [shard-started shard id [[logstash-mail-2018.09.17][1]], allocation id [Jo_7mNnMSWKiLzrKB3FLjA], primary term [0], message [after existing recovery][shard id [[logstash-mail-2018.09.17][1]], allocation id [Jo_7mNnMSWKiLzrKB3FLjA], primary term [0], message [after existing recovery]], shard-started shard id [[logstash-cron-2018.09.17][2]], allocation id [QSpAD5I2SS-KrUdxudGeig], primary term [0], message [after existing recovery][shard id [[logstash-cron-2018.09.17][2]], allocation id [QSpAD5I2SS-KrUdxudGeig], primary term [0], message [after existing recovery]], shard-started shard id [[logstash-apache-error-2018.09.17][1]], allocation id [647TJjG4QzeLOmtOtFAHMA], primary term [0], message [after existing recovery][shard id [[logstash-apache-error-2018.09.17][1]], allocation id [647TJjG4QzeLOmtOtFAHMA], primary term [0], message [after existing recovery]]] took [39.3s] above the warn threshold of 30s\n","stream":"stdout","time":"2018-10-09T22:13:30.825547563Z"}
{"log":"[2018-10-09T22:13:30,824][WARN ][o.e.c.s.ClusterApplierService] [7BNppEw] cluster state applier task [apply cluster state (from master [master {7BNppEw}{7BNppEwTRNuuSREQEKewnA}{GToVKQqrTsGVoMY0GWivNQ}{192.168.144.2}{192.168.144.2:9300} committed version [98] source [shard-started shard id [[logstash-mail-2018.09.17][1]], allocation id [Jo_7mNnMSWKiLzrKB3FLjA], primary term [0], message [after existing recovery][shard id [[logstash-mail-2018.09.17][1]], allocation id [Jo_7mNnMSWKiLzrKB3FLjA], primary term [0], message [after existing recovery]], shard-started shard id [[logstash-cron-2018.09.17][2]], allocation id [QSpAD5I2SS-KrUdxudGeig], primary term [0], message [after existing recovery][shard id [[logstash-cron-2018.09.17][2]], allocation id [QSpAD5I2SS-KrUdxudGeig], primary term [0], message [after existing recovery]], shard-started shard id [[logstash-apache-error-2018.09.17][1]], allocation id [647TJjG4QzeLOmtOtFAHMA], primary term [0], message [after existing recovery][shard id [[logstash-apache-error-2018.09.17][1]], allocation id [647TJjG4QzeLOmtOtFAHMA], primary term [0], message [after existing recovery]]]])] took [39.1s] above the warn threshold of 30s\n","stream":"stdout","time":"2018-10-09T22:13:30.826204607Z"}