Have just switched from running our logging cluster based around r3.xlarge to the new i3.xlarge.
Have been able to shut down a third of my nodes due to improved performance and am paying less per node for the i3.xlarge than I was for the r3.xlarge. Have halved my costs!
We only run a copy of days worth of logs on the 'front'. All data from recently completed logs are backed up onto s3 via the cloud_aws plugin. Yes the storage is ephemeral, but we would have to lose multiple nodes to lose data. Also as the indexes are aware of which AWS zone they are in - loss of a zone would not see us lose data. But things can still happen, so a last resort, we have the original pre-logstash source files we can go to. Currently we are using rsyslog, but plan to switch to Kafka., which would also hold post-logstash data.
Older data goes to a r3.xlarge instance with lots of ebs disk space - we lose the replica shards and optimise indices at this point, but all data will have been backed up to s3 and the data is rarely used. We can cope with a relatively long availability of that data. Two further smaller nodes are used as master only nodes as we do not allow the nodes at the front to be masters
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.