I have noticed that Elasticsearch is very generous when it comes to consuming storage. This makes it a challenge when the storage requirements start growing and the type of storage becomes costly (eg: SSD)
Is there a way to manage this like with the use of compression?
Does Elastic support compression methods?
If so do you need licenses to do it or is it free?
What is the best practice of keeping the storage at a manageable level ?
Thank you for the reply. This is more towards getting rid of unwanted fields and stuff right? How about compression? Does Elastic have some sort of compression that we can use to reduce the size old data takes? I am coming from a SIEM background so hope you get what I am actually trying to do.
Elasticsearch compresses by default and the link I provided discusses the best_compression codec that further compresses the source. Making sure data is mapped efficiently and not using the default mappings unless necessary can as the docs describe substantially reduce the size on disk.
This blog post discusses storage size and optimization. It is a bit old and refers to an older version of Elasticsearch (_all field is e.g. no longer available in newer versions), but the principles are still valid.
It will depend a lot on the data and size of the documents and how large portion of the index size the source accounts for. For some sample data sets I have seen a reduction in storage size by around 20%, which roughly matches the data in the blog post I linked to.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.