Storage Estimate for Metricbeat

baxter19 · April 10, 2017, 9:50pm

We are evaluating meticbeat as a possible replacement for our internal performance and availability monitoring tool which collects basic metrics around CPU, memory, disk, network once a minute. During a PoC we collected the equivalent beats data from a handful of servers and found that sampling data at the same time frame resulted in some fairly large storage usage (several 100MB for a single server). This was not optimized at all so I'm sure it can be reduced by being more selective about what we collect but it does give us concern as we look to be able to monitor several 1000 servers.

I'm wondering if anyone has experience collecting metricbeats from a large number of servers that can speak to how they manage the storage and retention requirements? Is there any approach for aggregating results into larger timeframes as it ages so that the granularity is reduced in favor of lower storage?

tudor · April 11, 2017, 11:31am

Here are some tips:

The main consumer of storage space are usually the "per process" stats. If you disable the "process" metricset from the system module, you will likely see a drastic reduction in storage size. If you want the "per process" information, you can choose to whitelist a set of processes to monitor.
For some of the data, you can reduce the polling interval, for example the file system stats (which also generate a lot of data) is fairly static, so you can reduce it to 30s.
By default ES uses 5 shards and 1 replica per index. Depending on your setup, you can go down to one shard and zero replicas, which will show an important improvement, but that of course depends on other things.
You can use processors to filter out the fields that you don't use. You should review the fields that are added to all objects (beat.*, metricset.*) and see if you need all of them.
You can disable the _source and the _all field, although in my testing this doesn't buy that much of an optimization compared with the other suggestions, and it does reduce the functionality.

There are other optimizations possible, but the above are giving the highest returns in my tests. We will be evaluating the defaults for 6.0, so we make Metricbeat more efficient by default.

system · May 1, 2017, 9:50pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
How to reduce disk usage for metricbeat Beats beats-module , metricbeat	6	713	March 10, 2023
Advice regarding Metricbeat sharding and retention Beats metricbeat	6	2058	March 14, 2018
Metricbeat Storage in ElasticSearch index Beats metricbeat	2	2509	January 28, 2020
Metricbeat - Sparsity - Best Practices Beats metricbeat	6	1178	May 22, 2018
Reduce metricbeat storage data Beats metricbeat	2	477	March 22, 2021

Storage Estimate for Metricbeat

Related topics