ELK cluster disk space usage optimization

nkoleff · October 28, 2015, 9:13am

Hello,

As a Newbie in elasticsearch I'm wondering what could be your tips of optimizing my indexes in order to lower down their maximum size.

My setup:
3 nodes, 3 shards 1 replica.

At the moment I'm documenting around 6million documents per day which is costing me 6-7-8 GB (depends) of disk space. I've used the mutate filter of logstash in order to remove some unneeded fields from the apache logs I store like:
remove_field => [ "@message", "@source", "ident", "auth", "ZONE" ]

I wonder is there anything that I can place in elasticsearch.yml that can help me reduce disk usage? I placed this setting index.compress.stored: true but i can't see any dramatic change.

Thanks in advance.

Christian_Dahlqvist · October 28, 2015, 9:18am

The amount of space the data takes up on disk once indexed depends on the fields you have in the documents as well as your mappings. You can optimise the mappings used by Logstash to reduce the size, and this blog post contains a discussion around what an be done and what the tradeoffs are.

magnusbaeck · October 28, 2015, 11:35am

remove_field => [ "@message", "@source", "ident", "auth", "ZONE" ]

Do you really have @message and @source fields? What Logstash are you running?

nkoleff · October 28, 2015, 11:52am

yep, is it wrong?

magnusbaeck · October 28, 2015, 11:54am

It seems you're running a really really old version of Logstash (1.1 or something). It doesn't matter for your disk space, but I think you should look into upgrading.

nkoleff · October 28, 2015, 12:21pm

How did you decide that im using 1.1?

My version is 1.5.4...

magnusbaeck · October 28, 2015, 1:12pm

That's weird. @message was a standard field in old Logstash releases but the field was renamed to message. Same thing with @source IIRC. Anyway, this is unrelated to your question.

nkoleff · October 28, 2015, 1:29pm

Actually I just figured out that by myself. Now my filter looks like this:

remove_field => "message", "_source" , "@source", "ident", "auth", "ZONE" but for some reason I am still seeing the source field

Any ideas why?

Christian_Dahlqvist · October 28, 2015, 3:59pm

The '_source' field has to be disabled in the index mapping. '_source' is useful to have, so before you remove it I would recommend looking at how you map the fields you are actually indexing and also consider whether you need the '_all' field or not. This is described in the blog post I linked to earlier.

Topic		Replies	Views
Compress Elasticsearch Index/Disk Usage Elasticsearch Elasticsearch	3	3613	July 6, 2017
ElasticSearch index size peculiarity Elasticsearch	2	688	July 6, 2017
Logs is sending into Elasticsearch takes much more disk space than in txt format Elasticsearch	7	741	June 5, 2017
How to optimize disk usage? Elasticsearch	5	1208	July 6, 2017
Logs are getting big in elasticsearch Elasticsearch	6	5160	April 6, 2017

ELK cluster disk space usage optimization

Related topics