We are evaluating meticbeat as a possible replacement for our internal performance and availability monitoring tool which collects basic metrics around CPU, memory, disk, network once a minute. During a PoC we collected the equivalent beats data from a handful of servers and found that sampling data at the same time frame resulted in some fairly large storage usage (several 100MB for a single server). This was not optimized at all so I'm sure it can be reduced by being more selective about what we collect but it does give us concern as we look to be able to monitor several 1000 servers.
I'm wondering if anyone has experience collecting metricbeats from a large number of servers that can speak to how they manage the storage and retention requirements? Is there any approach for aggregating results into larger timeframes as it ages so that the granularity is reduced in favor of lower storage?