Metricbeat - Sparsity - Best Practices

SKumarMN · April 24, 2018, 5:43am

Hi All,

I have a question related to storing of the metrics of various applications monitored by metribeat in the same metricbeat index. For example if I am monitoring system, apache, ngnix, mogodb using metric beat. For query performance and best practices perspective would it be fine/advised to push all the metrics in the same metricbeat-weekly index. My question is primarily related to sparsity. As my mappings would have fields for all applications but each of my documents contains subset of metrics wouldn't it lead to sparsity issue when storing doc_values.

rcowart · April 24, 2018, 5:50am

This is no longer an issue as of 6.0. A long with ES 6.0 came Lucene 7 which provided support for sparse doc values. This eliminates the storage overhead for "placeholders" where no data existed. So storage volume is reduced, which also frees up page cache space for more real data. IIRC, testing with Metricbeat resulted in an approx 30% reduction, with a related increase in performance.

SKumarMN · April 24, 2018, 5:58am

Thanks @rcowart for the reply.
I have two questions:

Is it a best practice storing metrics of different applications in a same index for scaling and performance.
I am currently using ES 5.6.5. For 5.6.5 how much would this(sparsity) be an issue when it comes to query performance or for 5.6.5 what should be the index creation strategy( group all of them to a single index / multiple index per application type with each index having a single primary shard).

rcowart · April 24, 2018, 6:14am

I really think it depends on the size of the indices. When an index becomes really large, indexing new data will become slower. Having lots and lots of small indices can also add unnecessary overhead, and problems with things like excessive open file handles.

For time series data like beat and logs, target index sizes around 10-30GB. If you need indices to be smaller use weekly or monthly indices instead of daily. If you need them be smaller, you could split data out into multiple indices.

If splitting different apps into different indices gives you optimal index sizes AND reduces sparsity, that is the best of both worlds.

SKumarMN · April 24, 2018, 6:27am

thanks @rcowart for your quick reply.

I have a question related to optimal index size for time series data.
As i read from this blog How many shards should I have in my Elasticsearch cluster? | Elastic Blog i see the below.

Blockquote
TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

So my question is if we go with 10-30GB per index wont we be left with many shards?

rcowart · April 24, 2018, 11:11am

The point is... if splitting up the data per application means that you have a bunch of indices smaller than a few GB, then you are better off either increasing the time period of data in the indices (e.g. monthly instead of daily) or keeping all of the apps together.

If you have so much data that you would have lots of indices larger than 50GB you will need to create a larger cluster (add more nodes).

system · May 22, 2018, 11:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice regarding Metricbeat sharding and retention Beats metricbeat	6	2058	March 14, 2018
Metrics and sparse indexes Elasticsearch	1	699	September 15, 2017
All modules log into the same index in metricbeat - best practise? Beats	4	802	April 17, 2017
Storage Estimate for Metricbeat Beats	2	3999	May 1, 2017
Metricbeat Storage in ElasticSearch index Beats metricbeat	2	2509	January 28, 2020

Metricbeat - Sparsity - Best Practices

Related topics