Elasticsearch cluster architecture

parthmaniar · August 6, 2020, 3:56pm

I am planning to deploy a cluster with 2 nodes. Following is the proposed architecture, I would like a community review.

Purpose: Non-enterprise. Final year project that will be collecting data from internet sensors (honeypots) for next 6 months.
Primary risk consideration: Downtime. Any downtime would break the chain of collection.
Proposed architecture: 2 Node cluster.

A a student I have limited compute resources. I have a single workstation that will server both nodes (I understand the downside of having single underlying hardware, however; given the situation I have to accept the risk.) I have a NAS which backups the VMs every 12 hours to minimise data recovery point objective.

Here are my questions:

Logstash pipeline is having IP of primary node. What are the steps to be taken when I have to take the primary node offline? How do I manage auto-switching of IPs?
Whenever I have to take a node offline, what are the precautions I have to take? (I may need to take the VM offline for security patching or tuning underlying OS for my project.)
Is giving non-master node lesser hardware OK? My primary need is ingestion and assimilation not multiple queries per seconds or minute. I will be lone user of the system and I will be querying large amount of data twice or thrice a week at maximum.

Thank you.

hendry.lim · August 7, 2020, 6:11am

Configure Logstash to point to both nodes, instead of one.
** You only have 2 nodes, you should enable node.master in both nodes. If one node is down, the other node will be able to take over.
You may follow the rolling upgrade doc, but essentially, set to only allow primary shards allocation during maintenance.
Should be fine for testing purposes. It will depend on the amount of data you will be expecting to query/store in Elasticsearch. For reference, I am running a fairly small cluster which is a 3 nodes cluster that has 4 vCores 8GB RAM per node.

parthmaniar · August 7, 2020, 6:49am

Thank you very much @hendry.lim

Is there a way I can order the nodes? For example, keep the one with higher compute and memory first and the lesser one second?

I read a guide where it states is better to have one master node and keep second node as data node? What are the downsides of this? (Source: https://logz.io/blog/elasticsearch-cluster-tutorial/)

What are the precautions when I am putting the primary/secondary node for maintenance?

hendry.lim · August 7, 2020, 7:08am

Not that I know of. Elasticsearch is a distributed system. We don't usually do that in a distributed system, which defeats the purpose imo.
Actually the recommended minimum viable cluster is 3 nodes, not 2, with the default minimum quorum of 2.
There is no primary/secondary concept in Elasticsearch (as no. 1 above). As long as your remaining node is able to handle all the load/data volume, for testing purposes, I wouldn't think too much about it.

parthmaniar · August 7, 2020, 11:29am

Thank you very much @hendry.lim.

Are following configurations OK. The reason I am spending time finalising is because I cannot take the system offline, since log collection will be real time and I cannot lose telemetry data.

Setting discovery hosts:

discovery.seed_hosts: ["host1", "host2"]

setting master nodes, do I need to change the order before taking node-1 offline?:

cluster.initial_master_nodes: ["node-1", "node-2"]

Before disconnecting a node, I plan to take put respective node out of shard allocation using:

My question here is: Is there an impact if I upgrade one node to next update of ELK stack (7.8.1 to whatever comes next? Or do I need to stop all ingestion and upgrade them together?

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "ip of node-1"
  }
}

I was reading this documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/current/high-availability-cluster-small-clusters.html and it states **"
A node with node.voting_only: true & other roles such as (data and master) being off - is this doable? and can this node be provisioned without the same storage (including IOPs) requirement of primary and secondary nodes? I want to make a 3 node cluster wherein two will storage data and provide HA while third one is only for tiebreaker

Thank you for guiding me.

hendry.lim · August 7, 2020, 12:02pm

For upgrades, please refer to this rolling upgrades doc.
You will have to upgrade one node at a time if you want to keep your cluster online. See the above doc.

You don't need to change any configuration to bring down a node.

Yup, certainly, the node.voting_only: true config is to set up a tie-breaker node. You can set up a very light ES node as a tie-breaker node, because it can only vote to fulfill the quorum.

parthmaniar · August 7, 2020, 7:39pm

This documentation (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts) recommends not setting dedicated master nodes in logstash pipeline. Given that I plan to have 2 master nodes and one voting node does this mean I have to give only the two master nodes IPs? Do I need to enable loadbalance for this to work? (I reckone no)

Also, If I am using one certificate per cluster how do I configure individual certs in case of two IPs in LS pipeline.

hendry.lim · August 8, 2020, 12:12am

You don't have dedicated master nodes. Please read this node doc.

You do not configure individual node cert in LS. You only configure the CA cert in LS. Probably one more topic for you to read up about.

parthmaniar · August 8, 2020, 6:49am

Thank you very much @hendry.lim.

How's this configuration for two master nodes:

node.master: true 
node.voting_only: false

While for the voting node (to be a tie-breaker)

node.master: false
node.voting_only: true
node.data: false 
node.ingest: false 
node.ml: false 
xpack.ml.enabled: false 
node.transform: false 
node.remote_cluster_client: false

Does this look OK?

hendry.lim · August 8, 2020, 9:25am

Looks fine for master/data nodes, except that you don't really need this node.voting_only: false, because that's the default.

parthmaniar · August 8, 2020, 9:32am

Perfect. I will keep this threat updated as to how it goes overall in next few months.

Have a wonderful, healthy and safe time ahead.

parthmaniar · August 9, 2020, 5:55am

Are following settings correct to create the cluster?

discovery.seed_hosts: ["primarynode", "secondarynode", "votingonlynode"]

and

cluster.initial_master_nodes: ["primarynode", "secondarynode"]

The smaller node used only for voting will not added to the initial master node settings.

Thank you.

wajika · August 13, 2020, 8:58am

If I have a cluster with 1 master and 3 nodes, does the rolling upgrade start from the master node first?

Christian_Dahlqvist · August 13, 2020, 9:07am

You should always look to have 3 master eligible nodes in any cluster.

wajika · August 13, 2020, 9:12am

Thank you for your reply

hendry.lim · August 13, 2020, 9:18am

The rolling upgrades doc has the suggested order you should follow when performing rolling upgrade.

system · September 10, 2020, 9:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
2 Node Cluster, 1 node being passive Elasticsearch	3	1267	July 6, 2017
2 Node cluster questions Elasticsearch	3	1968	February 7, 2018
Master Node vs. Data Node Architecture Elasticsearch	7	11371	July 6, 2017
Master node role in a cluster Elasticsearch	10	28139	July 6, 2017
Elasticsearch: 2-node cluster with failover Elasticsearch	7	5658	July 6, 2017

Elasticsearch cluster architecture

Related topics