I am sending metrics via collectd on a 10s interval rate. my daily index comes out to a size from 25-27gb. This seems a bit massive? The index settings are setup as follows "number_of_shards": "5",
"number_of_replicas": "1". I have a total of 12KV that are sent in the message. Is there something else that can be making this index so large?
Elasticsearch does a number of things to make your data more easily searchable, all of which will add some overhead to your documents, and therefore index size -- On top of that you also have 1 replica, so the data volume will be doubled immediately.
Analysed fields (or text fields in ES 6.x) will use some overhead since a field is split up into individual terms -- If you have any string fields that don't require full text search (i.e. you know the exact thing you want to search for), consider setting those fields not_analyzed, or keyword type in ES 6.
You also have two additional special fields that will cause bloat in the size of your documents:
_all (Remove in 6.x) -- A concatenation of all the values of your fields, for when you want to search for a specific value, but you don't care about a particular field. Off the top of my head, not sure if this is enabled by default or not.
All of the above will use additional storage overhead easily resulting in a document that's larger than the original document size, upto x2, x3 etc. You can tweak those settings in your index (and also not index any fields you don't care about) which could help save some space.
@Evesy we tried to disable the _source option but that seems to break alot of stuff so that is not an option for us. I am still curious to how this data set is so large. We did a trail run on writing striaght to disk and it comes out to about a .5 gb per day. But ES seems to make it 10-20x that . Is there something I am missing in what is being done within ES? Thanks for any help!!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.