How to reduce Index size on disk?

vikas_gopal · May 6, 2016, 2:57pm

Hi Experts,

I am using ES1.7.1 and I have an index which consumes 20GB of disk space , even after removing unnecessary fields . I am also using index.store.compress.stored : true in my template .

Can anyone suggest what further I can do to reduce my Index size ?

Thanks
VG

vikas_gopal · May 10, 2016, 11:26am

Strange that I haven't receive any suggestion on my query anyways following are some steps which I performed to reduce index size .Hope it will help someone .Please feel free to add more in case I miss something .

Delete unnecessary fields (or do not index unwanted fields, I am handling it at the LS level)
Delete @message field (if Message field is not in use you can delete this)
Disable _all field ( Be careful with this setting )
It is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter. It requires extra CPU cycles and uses more disk space. If not needed, it can be completely disabled.
Benefits of having _All field enabled :- Allows you to search for values in documents without knowing which field contains the value, but CPU will be compromised .
Downside of Disabling this field :- Kibana Search bar will not act as full text search bar , so user have to fire query like name : “vikas” or name:vika* (provided name is an analyzed field ) . Also the _all field loses the distinction between field types like (string integer, or IP ) because it stores all the values as string.
Analyzed and Not Analyzed fields :- Be very careful while making a field Analyzed and Not analyzed because to perform partial search(name :vik*) we need analyzed field but it will consume more disk space . Recommended option is to make all the string fields to not analyzed in the first go and then make any filed as analyzed field if needed .
Doc_Value :-Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. So, doc values offload this heap burden by writing the fielddata to disk at index time, thereby allowing Elasticsearch to load the values outside of your Java heap as they are needed. In the latest version of ES this feature has already been enabled .In our case we are on ES 1.7.1 version an we have to enable it explicitly which will consume extra Disk space but this does not degrade performance at all. The overall benefits of doc values significantly outweigh the cost.

Thanks
VG

Kazama · March 17, 2017, 1:08pm

What is that "@message" field?

vikas_gopal · March 17, 2017, 3:45pm

It's a default field that holds the complete raw data or log.

Kazama · March 17, 2017, 4:02pm

Could you please give me some directions? I can't find anything about it in the docs.

vikas_gopal · March 28, 2017, 5:50pm

Sorry for the late reply Kazama,
So this is how you can achieve the above in LS.
Inside your filter{} you can mention mutate {remove_field =>"message"}. Make sure you do this after actual parsing of your data .This will delete the entire message field from ES.
sample

input {   
      file { 
		  --------
		   } 
}
filter {        
grok { match=>["message",'abc:%{CEFNUM:test}[^\|\n]*\] }
          mutate {
                   -------
                   -------
                 }
          kv {
                 ------
				 -----
             }        
 mutate {remove_field => ["message"]}
}
output {     
        elasticsearch {
                       action => "index"
					   hosts=> ["localhost:9200"]
					   index => "test1"
					   }       
}

I hope this helps.

Regards
VG

Kazama · March 29, 2017, 6:43pm

Thanks for your time !)

So the reason why I couldn't find anything in the ES docs is because the @message field is LS-only feature.

Topic		Replies	Views
Reduce Disk Space Requirements Elasticsearch	2	597	July 6, 2017
Elasticsearch tuning Kibana	4	115	March 12, 2024
Compress Elasticsearch Index/Disk Usage Elasticsearch Elasticsearch	3	3572	July 6, 2017
Efficient storage Elasticsearch	4	415	November 21, 2019
Size of indices are too large Logstash	11	5546	July 8, 2019

How to reduce Index size on disk?

Related topics