Data mode for time series - coarse vs fine grained


(Bertrand Renuart) #1

We are storing about 300M data points every day in daily indexes.
Our current approach creates one document per data point looking as follows:

{
   "@timestamp": "20170516T130000.000Z",
   "host": "hostname",
   "metric": "jvm.mem.heap.used",
   "value": 1234
}

We changed the model recently to build larger documents grouping related metrics together - a bit like what beats is doing:

{
   "@timestamp": "20170516T130000.000Z",
   "host": "hostname",
   "jvm": {
      "mem": {
         "heap": {
            "used": 1234,
            "committed": 5678,
            "max": 9999
         }
      }
   }
}

This strategy helped us to dramatically reduce the amount of documents stored in the indexes by about 50% while keeping the same information.
We noticed a drop in the memory consumption (heap) required by the indexes. This reduction is mainly due to having less document uid in memory (about 50% less as mentioned).

At first sight this coarse model looks more efficient than the fine grained one.
Did anybody already come to the same conclusion?
Do you see any drawbacks?
Experience and comments are welcome...

Thanks.

/Bertrand


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.