Data mode for time series - coarse vs fine grained

Bertrand · May 16, 2017, 12:09pm

We are storing about 300M data points every day in daily indexes.
Our current approach creates one document per data point looking as follows:

{
   "@timestamp": "20170516T130000.000Z",
   "host": "hostname",
   "metric": "jvm.mem.heap.used",
   "value": 1234
}

We changed the model recently to build larger documents grouping related metrics together - a bit like what beats is doing:

{
   "@timestamp": "20170516T130000.000Z",
   "host": "hostname",
   "jvm": {
      "mem": {
         "heap": {
            "used": 1234,
            "committed": 5678,
            "max": 9999
         }
      }
   }
}

This strategy helped us to dramatically reduce the amount of documents stored in the indexes by about 50% while keeping the same information.
We noticed a drop in the memory consumption (heap) required by the indexes. This reduction is mainly due to having less document uid in memory (about 50% less as mentioned).

At first sight this coarse model looks more efficient than the fine grained one.
Did anybody already come to the same conclusion?
Do you see any drawbacks?
Experience and comments are welcome...

Thanks.

/Bertrand

system · June 13, 2017, 12:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to reindex a single index into several time-indexed indexes Elasticsearch	13	931	July 17, 2019
Index design question Elasticsearch	1	294	July 6, 2017
Metricbeat - Sparsity - Best Practices Beats metricbeat	6	1226	May 22, 2018
Performance of Elasticsearch for number of documents vs number of properties in a document Elasticsearch	4	551	September 3, 2021
Would there be an impact / difference of Big and Small Indices? Elasticsearch	9	537	August 14, 2020

Data mode for time series - coarse vs fine grained

Related topics