Elasticsearch cluster architecture

I am planning to deploy a cluster with 2 nodes. Following is the proposed architecture, I would like a community review.

  1. Purpose: Non-enterprise. Final year project that will be collecting data from internet sensors (honeypots) for next 6 months.
  2. Primary risk consideration: Downtime. Any downtime would break the chain of collection.
  3. Proposed architecture: 2 Node cluster.

A a student I have limited compute resources. I have a single workstation that will server both nodes (I understand the downside of having single underlying hardware, however; given the situation I have to accept the risk.) I have a NAS which backups the VMs every 12 hours to minimise data recovery point objective.

Here are my questions:

  1. Logstash pipeline is having IP of primary node. What are the steps to be taken when I have to take the primary node offline? How do I manage auto-switching of IPs?

  2. Whenever I have to take a node offline, what are the precautions I have to take? (I may need to take the VM offline for security patching or tuning underlying OS for my project.)

  3. Is giving non-master node lesser hardware OK? My primary need is ingestion and assimilation not multiple queries per seconds or minute. I will be lone user of the system and I will be querying large amount of data twice or thrice a week at maximum.

Thank you.

2 Likes
  1. Configure Logstash to point to both nodes, instead of one.
    ** You only have 2 nodes, you should enable node.master in both nodes. If one node is down, the other node will be able to take over.

  2. You may follow the rolling upgrade doc, but essentially, set to only allow primary shards allocation during maintenance.

  3. Should be fine for testing purposes. It will depend on the amount of data you will be expecting to query/store in Elasticsearch. For reference, I am running a fairly small cluster which is a 3 nodes cluster that has 4 vCores 8GB RAM per node.

Thank you very much @hendry.lim

  1. Is there a way I can order the nodes? For example, keep the one with higher compute and memory first and the lesser one second?

I read a guide where it states is better to have one master node and keep second node as data node? What are the downsides of this? (Source: https://logz.io/blog/elasticsearch-cluster-tutorial/)

  1. What are the precautions when I am putting the primary/secondary node for maintenance?
  1. Not that I know of. Elasticsearch is a distributed system. We don't usually do that in a distributed system, which defeats the purpose imo.
  2. Actually the recommended minimum viable cluster is 3 nodes, not 2, with the default minimum quorum of 2.
  3. There is no primary/secondary concept in Elasticsearch (as no. 1 above). As long as your remaining node is able to handle all the load/data volume, for testing purposes, I wouldn't think too much about it.
1 Like

Thank you very much @hendry.lim.

Are following configurations OK. The reason I am spending time finalising is because I cannot take the system offline, since log collection will be real time and I cannot lose telemetry data.

  1. Setting discovery hosts:
discovery.seed_hosts: ["host1", "host2"]
  1. setting master nodes, do I need to change the order before taking node-1 offline?:
cluster.initial_master_nodes: ["node-1", "node-2"]
  1. Before disconnecting a node, I plan to take put respective node out of shard allocation using:

My question here is: Is there an impact if I upgrade one node to next update of ELK stack (7.8.1 to whatever comes next? Or do I need to stop all ingestion and upgrade them together?

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "ip of node-1"
  }
}

I was reading this documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/current/high-availability-cluster-small-clusters.html and it states **"
A node with node.voting_only: true & other roles such as (data and master) being off - is this doable? and can this node be provisioned without the same storage (including IOPs) requirement of primary and secondary nodes? I want to make a 3 node cluster wherein two will storage data and provide HA while third one is only for tiebreaker

Thank you for guiding me.

For upgrades, please refer to this rolling upgrades doc.
You will have to upgrade one node at a time if you want to keep your cluster online. See the above doc.

You don't need to change any configuration to bring down a node.

Yup, certainly, the node.voting_only: true config is to set up a tie-breaker node. You can set up a very light ES node as a tie-breaker node, because it can only vote to fulfill the quorum.

1 Like

This documentation (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts) recommends not setting dedicated master nodes in logstash pipeline. Given that I plan to have 2 master nodes and one voting node does this mean I have to give only the two master nodes IPs? Do I need to enable loadbalance for this to work? (I reckone no)

Also, If I am using one certificate per cluster how do I configure individual certs in case of two IPs in LS pipeline.

You don't have dedicated master nodes. Please read this node doc.

You do not configure individual node cert in LS. You only configure the CA cert in LS. Probably one more topic for you to read up about.

1 Like

Thank you very much @hendry.lim.

How's this configuration for two master nodes:

node.master: true 
node.voting_only: false 

While for the voting node (to be a tie-breaker)

node.master: false
node.voting_only: true
node.data: false 
node.ingest: false 
node.ml: false 
xpack.ml.enabled: false 
node.transform: false 
node.remote_cluster_client: false 

Does this look OK?

Looks fine for master/data nodes, except that you don't really need this node.voting_only: false, because that's the default.

1 Like

Perfect. I will keep this threat updated as to how it goes overall in next few months.

Have a wonderful, healthy and safe time ahead. :slight_smile:

Are following settings correct to create the cluster?

discovery.seed_hosts: ["primarynode", "secondarynode", "votingonlynode"]

and

cluster.initial_master_nodes: ["primarynode", "secondarynode"]

The smaller node used only for voting will not added to the initial master node settings.

Thank you.

If I have a cluster with 1 master and 3 nodes, does the rolling upgrade start from the master node first?

You should always look to have 3 master eligible nodes in any cluster.

1 Like

Thank you for your reply

The rolling upgrades doc has the suggested order you should follow when performing rolling upgrade.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.