As a Newbie in elasticsearch I'm wondering what could be your tips of optimizing my indexes in order to lower down their maximum size.
My setup:
3 nodes, 3 shards 1 replica.
At the moment I'm documenting around 6million documents per day which is costing me 6-7-8 GB (depends) of disk space. I've used the mutate filter of logstash in order to remove some unneeded fields from the apache logs I store like:
remove_field => [ "@message", "@source", "ident", "auth", "ZONE" ]
I wonder is there anything that I can place in elasticsearch.yml that can help me reduce disk usage? I placed this setting index.compress.stored: true but i can't see any dramatic change.
The amount of space the data takes up on disk once indexed depends on the fields you have in the documents as well as your mappings. You can optimise the mappings used by Logstash to reduce the size, and this blog post contains a discussion around what an be done and what the tradeoffs are.
It seems you're running a really really old version of Logstash (1.1 or something). It doesn't matter for your disk space, but I think you should look into upgrading.
That's weird. @message was a standard field in old Logstash releases but the field was renamed to message. Same thing with @source IIRC. Anyway, this is unrelated to your question.
The '_source' field has to be disabled in the index mapping. '_source' is useful to have, so before you remove it I would recommend looking at how you map the fields you are actually indexing and also consider whether you need the '_all' field or not. This is described in the blog post I linked to earlier.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.