I am trying to come up with the cluster sizing for our centralised logging system. On reading the factors influencing the size i came across two terms
JSON conversion factor
Index conversion factor
Not sure what are these. tried to find it in the elastic blog and couldn't find anything useful. Can someone detail it out or direct me to a blog where i can find the explanation.
This was discussed in this webinar and this blog post, and is basically a way to reason about how raw data transforms into indexed data on disk. Often we first convert the raw data into JSON documents, and how this changes size depends on how we parse and structure the data as well as how much enrichment data we add. This is what we often refer to as the JSON conversion factor, and can vary a lot between different types of data.
Once we index this into Elasticsearch, the size will change again. This will typically depend on data itself, the mappings used as well as index settings and shard size. This is what we refer to as the Index conversion factor.
To get the size of primary shards on disk we basically take the raw data volume and multiply it by these two factors. The reason this was picked is that it is relatively easy to test and use in benchmarks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.