I have to design an ES cluster, I'm far from understanding everything, I've read different articles (like those of Fred de Villamil ( Designing the Perfect Elasticsearch Cluster)) and ES documentation, and I have no IRL contact to discuss with, so here a I to get some advices about my first thoughts
The purpose is not a continuous insertion in the cluster, but many huge sets of data (most of all inserted at the beginning) to be requested, and updated : some fields will be updated weekly
The reasons why I choose ES were : json elasticity (each doc won't be updated the ame way) and scalability
Info about :
- ~30 billions documents (start at 20b, may reach 40b)
- ~300B/doc, between 200-400B
- only one index (pertinent ?)
- few agregations, most is queries ('filter' and not 'query' (no need of scoring))
This gives ~9 TB of data
Supposed index config : 512 shards (15GB data/shard)
Supposed cluster config:
-
1 node, 96-128GB ram (24-30 for Heap)
-
3 nodes :
- 1 master
- 2 data : 64/96/128GB ram (24-30 for Heap)
I'm here to listen any constructive advicee on the cluster configuration, and maybe some explanations on the real purpose of having 1 master+2 data node instead of 1 master&data + 2 data nodes ?
Thanks