Metrics and sparse indexes


(Gavin Lambert) #1

I'm looking to store several time-series metrics in an ES system (currently 5.5) and I'm trying to work out the most recommended index/field design for this sort of thing.

There's only a small number of distinct types of metrics (probably around 20) but each metric has a couple of grouping keywords and some of them have a relatively large number of possible values (user-specified values, so technically unbounded, although probably less than 10,000 in most cases).

In particular I'm trying to work out whether it is better to use one index per metric type or one index for all metrics with sparse fields (eg. "metricname.key1"). Other than a common timestamp field the different metrics don't really have any structure in common. (There are a couple of metrics that do share the same grouping keys, and I assume it makes the most sense to always store these in the same index regardless.)

Most of the recent recommendation for types are to avoid storing multiple data types in one index and use separate indexes for this instead, partly because ES types are being removed and partly because they cause sparse doc-values, which are inefficient. However creating multiple indexes creates extra shards, and I'm not sure which is worse.

The incoming data rate for these metrics is going to be sweet-F-A (probably just a couple dozen or so points per metric per minute) so I'm assuming it can all be handled by a single node with a single shard per index. And maybe this is small enough that sparsity and/or shard concerns don't matter? That's part of what I'm wondering.

(And I'm assuming some kind of doc-count or time-based rollover of each index to multiple actual indexes in addition to this.)

When I look at Metricbeats, which I think has a similar sort of data pattern, it appears to take the approach of cramming everything into one index without worrying about sparse indexes. I'm not sure if this is relevant to my data or not.

The data is mostly intended for time-series visualisations, so will be aggregation-heavy (and mostly on only one metric at a time in any given graph) with very little search.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.