I am currently developing some cluster concept and once again asking for your opinions and insight. We will need different clusters for different customers. I have some questions regarding my concepts and I hope you can help with this.
Concept 1, Simple Cluster. Should be able to handle 200MB/day. Maybe more in the future. What should I do when I add new nodes? The elastic documentation states that after 3 Nodes I should start with giving some nodes specific rolles. The problem that I have with this is that I want tho have a reliable cluster so when I start to add specific rolles I will need at least 3 new nodes with these roles so that they are failproof. Right? So when I want nodes that are only Master I will need 3 new Nodes which only do the master job. That doesn't sound to good.
Concept 2, Master/Data:
This concept should be used when we have customers with more data. 20GB/day. All customers have timeseries data, so we will never change or delete any of it. Is this a good solution for this? What my concern is that the master nodes are useless and bored. I can't see why they would ever have to work with more than 5% of their power but still they would need 30% of the budget. At what point do master nodes become useful? Shard allocation and cluster managment doesn't sound resource
Concept 3, Hot-Warm:
Is the following concept better for timeseries data? The data is still around 20GB/day. I am more happy with this concept because the master nodes are also the warm nodes. I don't think the warm nodes will get a lot of traffic so they should be fine. Or is this a bad idea?
Is the following concept really better? It would add a lot of costs. (If there is a good reason for master nodes we are happy to pay for it, but not if they are not that important)
Concept 4, Hot-Warm-Cold-Master:
This is for the biggest project. 200GB/day or more. I think in the following concept coordination nodes are missing. They sound really important when having such a big cluster and so much data. What is your opinion on coordination nodes? What configuration should they have?
To sum up some questions:
- How important are master nodes, at what point should we use them?
- What is a good cluster concept for timeseries data?
- Is there anything "stupid" in my concepts? Open for improvements.
- What do your clusters look like? For comparison.
Thank you for your time,