We have different applications running on aws machines, which end up with a visualization reading of an elasticsearch cluster. Currently we have a cluster of 3 nodes with 3 EC2 r5.4xlarge instances, but we have many performance problems in the visualization (graphics that take a lot of time to load, crashes of the cluster due to large queries, etc).
So now, we want to optimize the cluster and take better advantage of horizontal scalability, adding new nodes (probably we can also use less powerful machines than the current ones).
We have 4 apps ingesting data in Elasticsearch:
- App1: 25GB/day. (Currently we only store in elasticsearch the data from the last week, 175GB +-)
- App2: 8 GB/day. (Currently we only store in elasticsearch the data from the last week, 56GB +-)
- App3: 400MB/day. (Currently we only store in elasticsearch the data from the last 15 days, 4GB +-)
- App4: 50MB/day. (Currently we only store in elasticsearch the data from the last 30 days, 1.5GB +-)
To "design" the new cluster we have different questions:
- How many nodes should it have for this volume of data?
- Hoy many node.master, node.data, etc?
- How big do the shards have to be?
- Is it better an index per day or not?
- What aspects must be evaluated when choosing the machines that will make up the cluster? For example: Is it better 6 nodes with 16GB RAM or 12 nodes with 8GB RAM.
- Other aspects that influence the performance of the visualization.
I appreciate any kind of help, thank you very much in advance.