We're attempting to use a dedicated Marvel cluster (2 c3.2xlarge instances) to monitor our ES cluster. We've recently expanded the cluster, and populated some data to increase the number of indices (>800) and shards (>6K). This will grow as we further populate the database. In the expansion, we've noticed our marvel indices growing from 3-5GB a day to over 50GB per day, even though the cluster has grown by roughly 2x in terms of node count. We run out of disk space with less than 1 day of Marvel data, whereas previously we were using curator to delete data older than 7 days.
We've tried changing 2 settings so far
- marvel.agent.interval - defaults to 10s, changed to 30s. We changed this to reduce CPU usage on the cluster. This hasn't appeared to change the amount of data (we still run out of disk in < 1 day), but the nodes frequently appear as not having contacted Marvel recently (the ! next to the node name).
- marvel.agent.indices - defaults to "*", we changed it to "" hoping to just turn off all index reporting and see what that does. There was no apparent change in data consumption. Due to the way indexes are allocated in the cluster, we cannot choose an index that exists on all nodes, and we would like to get node metrics for all nodes.
How can we reduce the data consumption of Marvel? Optimally I'd like to keep it down to around 10-15GB per day so we don't have to expand the monitoring cluster further; there's not a lot of utility in holding 50GB per day.
Our ES cluster is 1.4.3, same as the Marvel cluster.