Hello everyone,
/!\ NB: First I would say that I am really really sorry for the ASCII Art but I can't upload images since this morning and I don't know why.
I read a lot about how an Elastic cluster should / could be set up. I watched a lot of videos too but I fear that I missed something before going on an installation. This is why, I want to share with you what I would do and maybe I can receive some advices about it.
On a test period, my estimation on the quantity of daily events is about 13GB. I plan to add some logs sources so let's say that I need 15GB daily. I need to keep these logs for a period of 6 months, so :
Total Capacity needed = Daily logs quantity X retention period
Total Capacity needed = 15GB X (30x6) = 15GB X 180
Total Capacity needed = 2 700 GB = 2,7 TB
Let's round up to 3TB to be sure I will never have storage issues.
NB : Currently I have to delete my indices manually because I am on a single node for testing and I have only 300 GB of storage so I keep my logs less than 1 month.
Now I would build a real architecture with the Elastic Stack. Here is how I imagined it
Servers | CPU | DISK | RAM |
---|---|---|---|
Elasticsearch - Node 1 | 4 or 8 cores | 1 TB | 16 |
Elasticsearch - Node 2 | 4 or 8 cores | 1 TB | 16 |
Elasticsearch - Node 3 | 4 or 8 cores | 1 TB | 16 |
Logstash | 8 cores | 20 GB | 4 - 8 GB |
Kibana | 2 cores | 20 GB | 4 GB |
________________________________________________________
Logstash |CLUSTER |
.----. | |
| == | | | Elasticsearch - Node 1 |
| | | | / \ |
| == | | =====> | / \ | <=== https://kibana:5601
| | | | / \ |
|::::| | | / \ |
|___.| | | Elasticsearch - Node 2 <---> Elasticsearch - Node 3 |
| |
----------------------------------------------------------
I started with a homogeneous architecture because I think it meet up to my needs (I have some questions to be sure of that). I didn't plan to set replicas because it will cost to much storage (3TB X 2 = 6) and I won't be able to undertake that. I have some doubts on the architeture so here are some questions :
1 - If I omit node roles, each node will take on every roles but there will be one master at time. Let's assume node 1 is master, to which data node (2 or 3) the logstash will send data ?
Is there any rule like "Node 2 have currently more storage so it will receive the data" or as the cluster is synchronized shards will be shared between the 2 data nodes ?
2 - When do the roles change knowing each nodes can be master (and if none of them fail)? Is it possible that node 1 stay "master node" for 2 months so all the storage on it will be unused ?
3 - Given that it is not a Hot-Warm architecture, but each node can take hot or warm role, if I create a Lifecycle policy for my indices with hot and warm phases before deleting, is it going to work correctly to meet up my needs of 6 months of retention ?
Sorry that's a lot of questions but despite all topics that I read or videos that I watched there are some features that I don't understand properly.
Thanks in advance !