ELK cluster disk space usage optimization


(Nikola Kolev) #1

Hello,

As a Newbie in elasticsearch I'm wondering what could be your tips of optimizing my indexes in order to lower down their maximum size.

My setup:
3 nodes, 3 shards 1 replica.

At the moment I'm documenting around 6million documents per day which is costing me 6-7-8 GB (depends) of disk space. I've used the mutate filter of logstash in order to remove some unneeded fields from the apache logs I store like:
remove_field => [ "@message", "@source", "ident", "auth", "ZONE" ]

I wonder is there anything that I can place in elasticsearch.yml that can help me reduce disk usage? I placed this setting index.compress.stored: true but i can't see any dramatic change.

Thanks in advance.


(Christian Dahlqvist) #2

The amount of space the data takes up on disk once indexed depends on the fields you have in the documents as well as your mappings. You can optimise the mappings used by Logstash to reduce the size, and this blog post contains a discussion around what an be done and what the tradeoffs are.


(Magnus Bäck) #3
remove_field => [ "@message", "@source", "ident", "auth", "ZONE" ]

Do you really have @message and @source fields? What Logstash are you running?


(Nikola Kolev) #4

yep, is it wrong?


(Magnus Bäck) #5

It seems you're running a really really old version of Logstash (1.1 or something). It doesn't matter for your disk space, but I think you should look into upgrading.


(Nikola Kolev) #6

How did you decide that im using 1.1?

My version is 1.5.4...


(Magnus Bäck) #7

That's weird. @message was a standard field in old Logstash releases but the field was renamed to message. Same thing with @source IIRC. Anyway, this is unrelated to your question.


(Nikola Kolev) #8

Actually I just figured out that by myself. Now my filter looks like this:

remove_field => "message", "_source" , "@source", "ident", "auth", "ZONE" but for some reason I am still seeing the source field :frowning:

Any ideas why?


(Christian Dahlqvist) #9

The '_source' field has to be disabled in the index mapping. '_source' is useful to have, so before you remove it I would recommend looking at how you map the fields you are actually indexing and also consider whether you need the '_all' field or not. This is described in the blog post I linked to earlier.


(system) #10