I am preparing the following elasticsearch cluster architecture:
Total 6 Nodes:
1 Node with roles: master and remote_cluster_client
3 Nodes with roles: data, data_hot, data_content and ingest
1 Node with role data_warm
1 Node with Kibana
Ideally the goal is to store logs from the custom applications, developed by me. The logs will be something like 15k -20k per day. The logs will stay on the hot nodes for 2 months, because they will be accessed from Kibana for reporting. After this 2 months the logs will go to the warm Node (using ILM). After 6 months staying on the warm Node, the index will be sent to S3 archive.
The questions that I have are the following:
Is the architecture suitable for the required job?
Is 1 Node with Master role enough? What will happen if the master goes down?
According to the documentation, what I understood is that you can configure the ILM policy of the warm node to keep the data for XXX period and before deleting it to archive it to S3. What is the procedure to restore logs for a specific date that has already been sent to S3 (not in the warm Node period)?
You should always look to have 3 master eligible nodes as that will allow the cluster to continue operating if 1 is unavailable. It would therefore be better to have 3 nodes with master, data_hot, data_content and ingest roles. Unless you are going to have multiple clusters and use cross-cluster serach I am not sure why you would use remote_cluster_client. If you want warm data to also be highly available I would have 2 data nodes with the data_warm role.
No. That would make the cluster unavaiable and you would lose all data if the master node was permanently lost.
Thank you very much, Christian.
I will put master role also on the data nodes. In this case, is it possible to set a priority that the master role will be hold only by the master node and only if failure happens to be transferred to the other nodes?
What kind of backup strategy (apart from the snapshots) is good to be implemented on the nodes on server level?
Make the 3 hot data nodes master eligible and do not use another dedicated master node. You want 3 master eligieble nodes, not 4. You can not (and do not need to) control which node is elected master at any point.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.