we are about to deploy EL cluster to our older HW to utilize it and looking for some advices.
My current setup is 8x blade servers with internal storage and 88GB of RAM all together (will be distributed into blades), EL cluster to assume to process around 300mil of time series documents (logs) in total (20mil/month) in approx 160GB of total size. Indexing is not the main issue done in bulk from files, but mainly used for searches/aggregations (Kibana 4)
My plan is to distribute it as follows:
2x master nodes - holding no data, having each 8GB of RAM (16 GB total)
3x workhosre nodes - holding data, having each 16 GB of RAM (48GB total)
1x logstash server (+1 cold backup) - having installed logstash with processing of csv files and seding to EL cluster with 8GB of RAM
Is this setup OK? Any changes to be done?
And three teoretical questions:
- What is better for searches/aggregations - 4 servers with 16GB RAM each or 2 servers with 32GB RAM each?
- Its better to have workhorses the same amount of memory or it doesnt matter at all?
- Master node not holding data should have more memory or leave it all to workhorses ?
You should really have an uneven number of master nodes, so that when you set min masters it is always a majority (num masters / 2 + 1).
1 - depends on your searches and data size.
2 - Data nodes should definitely have the same amount of heap
3 - Dedicated masters should not require much memory, maybe 3 GB of a 4GB heap.
Yes I see, so not to have split brains
So for my case of dedicated master nodes I must have 3 dedicated master node which doesnt hold data, have low memory (4gb (2gb heap, 2gb left for OS)).
Then for data nodes I will have any amount of data nodes, since 3 dedicated master nodes are quite safe.
Since 3 dedicated data nodes with high amount of RAM (equal distributed) I will split indices monthly indices, all of them have 3 shards and 1 replica.
what do you think? good thinking ??
For the number of nodes in your cluster, you are probably better off just adding another data node or two. Dedicated masters are nice but overkill here.
So you would suggest not to have dedicated master nodes (to have more data nodes and some them have master functionaity) ?
I was also thinking of having more nodes with less memory, but i didnt know wheter is has any advantages.
For your current use case, yes.
I have reevaluated all the possibilities and concluded following:
3x dedicated master node (4-8gb RAM)
6x data nodes (16GB RAM (8GB heap, 8GB garbage collection+OS)
Therefore I will use monthly divided indices, divided into 6 shards (to have each shard on each node) and 2 replicas to make sure that will be resilient enough.
I will ude master nodes as client nodes and have logstash installed on one of them to process request and push it directly to data nodes for indexing.
My main focus are aggregations over the all logs and searches sometimes.
Any comments please?
Anyway @warkolm thank you for your help