Hi @dadoonet , @Christian_Dahlqvist , @Kumar_Narottam
The thing like this that We have total 20 es data node. each of them has 32G mem and 32core CPU and 4.6TB storage.
we want to keep a year metrics and 6 months logs alive on those whole es data clusters.
We probably need to index around 200GB data for metrics and 100~150GB data for logs per day, more or less, over time,
likely every a year probably the 20% of volume will be increased.
so we're really want to make a good shard allocation across the whole es data nodes for performance,
whatever for read, write or query from es data nodes.
our architecture like this:
beats -> Haproxy cluster(2) -> logstash cluster(3) -> es cluster {master(3),client(3),data(20)}->dashboard{grafana,kibana}
for beats, we're using metricbeat for collecting the metrics, and filebeat for collecting the logs.
The Haproxy is using for shipping the metrics/logs to logstash cluster,
The logstash is using filter and modify the original data and then send it to es data nodes.
The client node is use for query data from dashboard like grafana and kibana.
The grafana is using to visualize the metrics .
The kibana is using to visualize the logs.
I'm looking forward to your replying.