How to reduce Index size on disk?


(Vikas Gopal) #1

Hi Experts,

I am using ES1.7.1 and I have an index which consumes 20GB of disk space , even after removing unnecessary fields . I am also using index.store.compress.stored : true in my template .

Can anyone suggest what further I can do to reduce my Index size ?

Thanks
VG


(Vikas Gopal) #2

Strange that I haven't receive any suggestion on my query anyways following are some steps which I performed to reduce index size .Hope it will help someone .Please feel free to add more in case I miss something .

  1. Delete unnecessary fields (or do not index unwanted fields, I am handling it at the LS level)
  2. Delete @message field (if Message field is not in use you can delete this)
  3. Disable _all field ( Be careful with this setting )
    It is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter. It requires extra CPU cycles and uses more disk space. If not needed, it can be completely disabled.
    Benefits of having _All field enabled :- Allows you to search for values in documents without knowing which field contains the value, but CPU will be compromised .
    Downside of Disabling this field :- Kibana Search bar will not act as full text search bar , so user have to fire query like name : “vikas” or name:vika* (provided name is an analyzed field ) . Also the _all field loses the distinction between field types like (string integer, or IP ) because it stores all the values as string.
  4. Analyzed and Not Analyzed fields :- Be very careful while making a field Analyzed and Not analyzed because to perform partial search(name :vik*) we need analyzed field but it will consume more disk space . Recommended option is to make all the string fields to not analyzed in the first go and then make any filed as analyzed field if needed .
  5. Doc_Value :-Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. So, doc values offload this heap burden by writing the fielddata to disk at index time, thereby allowing Elasticsearch to load the values outside of your Java heap as they are needed. In the latest version of ES this feature has already been enabled .In our case we are on ES 1.7.1 version an we have to enable it explicitly which will consume extra Disk space but this does not degrade performance at all. The overall benefits of doc values significantly outweigh the cost.

Thanks
VG


#3

What is that "@message" field?


(Vikas Gopal) #4

It's a default field that holds the complete raw data or log.


#5

Could you please give me some directions? I can't find anything about it in the docs.


(Vikas Gopal) #6

Sorry for the late reply Kazama,
So this is how you can achieve the above in LS.
Inside your filter{} you can mention mutate {remove_field =>"message"}. Make sure you do this after actual parsing of your data .This will delete the entire message field from ES.
sample

input {   
      file { 
		  --------
		   } 
}
filter {        
grok { match=>["message",'abc:%{CEFNUM:test}[^\|\n]*\] }
          mutate {
                   -------
                   -------
                 }
          kv {
                 ------
				 -----
             }        
 mutate {remove_field => ["message"]}
}
output {     
        elasticsearch {
                       action => "index"
					   hosts=> ["localhost:9200"]
					   index => "test1"
					   }       
}

I hope this helps.

Regards
VG


#7

Thanks for your time !)

So the reason why I couldn't find anything in the ES docs is because the @message field is LS-only feature.


(system) #8