Advice regarding Metricbeat sharding and retention

matt.hagenbuch · February 8, 2018, 1:33pm

Hi,

I have a largish(?) metricbeat installation which collects from ~2000 machines minutely, generating ~100GB index with 135M docs per day. Currently I have the indexes set to 5 shards x2 replication, which is performing well enough when querying a few days. In this context, I have two questions:

Would you recommend splitting the daily indexes by metricset to reduce the sizes so that aggregations over longer time periods (>7 days) run more efficiently? If I split by metricset, should I adjust the number of shards?
What combination of forceMerge/shrink operations would be appropriate to consolidate older metricbeat data? I found this article, but it seems targeted at ES 2.0, is this advice still relevant?

Thanks in advance for any advice you can give!

ruflin · February 9, 2018, 12:10am

Great to hear that you deployed Metricbeat on this scale.

For question 1:

What do your queries look like?
What is the size of each of your index?
How many metricsets do you use and which ones? (shard explosion can be an issue here)

For question 2:

I think the basics are still true (didn't check in all details) but there are new features in ES 6 like shrinking the number of shards which could be useful: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-shrink-index.html

matt.hagenbuch · February 9, 2018, 1:26pm

Hi ruflin, thanks for responding so quickly!

Most of our current queries against metricbeat come from kibana/grafana dashboards, so they are aggregations over host filters and time buckets. I expect most queries would only request data from a single metricset.

Each daily metricbeat index is about 100GB with 135M documents.

We use the diskio, filesystem, process, network, and cpu system metricsets. Your question led me to look at a histogram of doc-counts for each of these in 1 day, and the result was a little surprising, though it makes sense in hindsight:

diskio: 63M
filesystem: 22M
process: 22M
network: 14M
cpu: 2.5M

As for shrinking shards, I did look into this but it seems that since we currently have 5 shards per index the only option would be to shrink to 1 shard, is that correct?

Thanks again!

ruflin · February 12, 2018, 2:55am

As you have quite a bit of data I'm wondering if you should perhaps start using rollover (https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html) instead of daily indices. This would give you a more predictable size of the indices.

As you have only 5 metricsets it would probably give you an improvement to have one index per metricset as the data would also be stored closer together. I wanted to do some benchmarks for this on rally for quite some time to see if also sorting on ingest time could make a difference but didn't get to it yet. So this is only my theory and not proven yet.

From 5 you can only go to 1. But after indexing is done I wonder if this would be an issue? There is also a feature splitting of shards coming: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/indices-split-index.html But please read the limitations there.

matt.hagenbuch · February 13, 2018, 7:20pm

I think your advice regarding rollover indices is good; that will allow us to have more evenly-sized indices once we start indexing by metricset.

I wrote a simple python script to benchmark the query response time of our ES cluster for some representative query aggregations. I have left that running periodically so I can measure the performance impact of any changes going forward.

I am going to see what shrinking older indices to 1 shard does to performance. Should I use forceMerge at all during the process or does shrinking already combine the Lucene segments?

ruflin · February 14, 2018, 12:01am

I'm really curious about the results here. Would you mind sharing them later?

Based on the docs I assume force-merge does not happen as part of the shrinking but I'm not sure about the inner workings. So using force merge after the shrinking could be beneficial especially as now writes happen anymore to this index (I assume).

system · March 14, 2018, 12:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metricbeat on 10 hosts - how to config for low shards Beats metricbeat	10	1193	May 24, 2019
Metricbeat - Sparsity - Best Practices Beats metricbeat	6	1218	May 22, 2018
Metricbeat Index Size Recommendation Beats metricbeat	3	659	October 16, 2018
Recommended way to reduce overload on ES Elasticsearch	10	3693	July 6, 2017
Sharding by time Elasticsearch	16	1503	July 6, 2017

Advice regarding Metricbeat sharding and retention

Related topics