Setting up a multi-nodes cluster on a single machine ( 1 elasticsearch.yml file )

Catalin · May 6, 2021, 11:52am

I have been using elasticsearch for development on my local machine and managed to setup a 3 nodes cluster by starting each node with the following command:

elasticsearch -Epath.data=data1 -Epath.logs=log1 -Enode.name=node1
elasticsearch -Epath.data=data2 -Epath.logs=log2 -Enode.name=node2
elasticsearch -Epath.data=data3 -Epath.logs=log3 -Enode.name=node3

And having discovery.zen.minimum_master_nodes: 2 in elasticsearch.yml

For production I also have only 1 machine and I'm looking to setup the same 3 nodes cluster.

The issue is that pretty much all examples describe each node having its own elasticsearch.yml where you can set the configurations for each separate node.

Am I missing something or is it just the difference between using different machines for each node instead of 1 ? And if so, if I'm using the setup described above, is it somehow possible to setup individual nodes differently under the same elasticsearch.yml

Like:

node.node1.ingest: false ?

Christian_Dahlqvist · May 6, 2021, 1:26pm

Each node need its own data directory and config file. What do you expect to gain from having multiple nodes on a single host?

Catalin · May 6, 2021, 7:30pm

Each node does have it's own data directory in the case I described. They just happen to have the same config file.

What do you expect to gain from having multiple nodes on a single host?

Redundancy, in case one fails. Isn't that the whole point ? Also parallelism.

If you have a machine with 4-8 or 32 cores. Are you using a single node ? So, why install Elasticsearch x number of times on the same server when it allows you to create multiple nodes with only one installation ? I simply don't understand the point of it in the first place. Can you please explain.

leandrojmp · May 6, 2021, 8:04pm

Unless you have a large bare-metal server with 128 GB RAM or more, having multiple nodes on the same host does not makes much sense.

You will have multiple small nodes competing for the same resources, elasticsearch is more bound to the disk and memory than to the CPU cores and in the end if your host fails, your entire cluster fails. In this case is better to have a larger node instance.

But to run multiple nodes in the same hosts you need to have a different elasticsearch.yml for every node with separated data and log folders, there isn't a way to use the same elasticsearch.yml to run multiple nodes at the same time.

Even if you have large servers with 128 GB or more, I would recommend to use docker to run the instances instead of configure multiple nodes in the same host.

Catalin · May 6, 2021, 8:17pm

I understand.

Yet on a single node cluster replica shards can't be allocated. Isn't this important as well ?

If you are running a single instance of Elasticsearch, you have a cluster of one node. All primary shards reside on the single node. No replica shards can be allocated, therefore the cluster state remains yellow. The cluster is fully functional but is at risk of data loss in the event of a failure.

If I start Elsticsearch instances with the following command, I'll end up having 1 config folder, so basically 1 elasticsearch.yml file, and multiple data folders, each holding separate nodes. There's also a node.max_local_storage_nodes setting.

elasticsearch -Epath.data=data1 -Epath.logs=log1 -Enode.name=node1
elasticsearch -Epath.data=data2 -Epath.logs=log2 -Enode.name=node2
elasticsearch -Epath.data=data3 -Epath.logs=log3 -Enode.name=node3

Christian_Dahlqvist · May 6, 2021, 8:28pm

Do the nodes share storage or do they have separate disks?

Catalin · May 6, 2021, 8:40pm

Share storage. It just stuck to me the idea that you need multiple nodes, no matter what, and that it automatically increases performance and reliability regardless of the actual number of servers. The idea was consolidated by the fact that you can create such a thing as multiple nodes and a single elasticsearch.yml

I will probably follow your advice and create only 1 node, but the whole thing is a bit weird, because all documentation seems to point towards having multiple node clusters.

Christian_Dahlqvist · May 6, 2021, 8:42pm

Multiple nodes improve resiliency and performance as long as they are on different hosts and do not share common points of failure.

leandrojmp · May 6, 2021, 8:50pm

Besides what Christian already said, running multiple nodes in the same host with shared storage and using replicas can make your performance worst because you will be using the same IO resources to write the same thing in the same place many times.

For example, if you have an index with 3 shards and 1 replica, in total you will have 6 shards, 3 primaries and 3 replicas, but in the end your storage is shared, as are your resources, so you will be writing the same thing in the same place twice.

Also, when you use elasticsearch -E config=value, you are in reality not using elasticsearch.yml for this config, I don't know if you could pass every node config using command line, never tested, but I would recommend to stick with one elasticsearch.yml per node if you want to follow in this path.

And the config node.max_local_storage_nodes is deprecated, it will be removed in version 8.X

Catalin · May 6, 2021, 9:13pm

Got it. In this case, when needing to increase performance, does it make more sense to split the same resources over multiple machines ? Do you think there is a significant difference in performance in having (for example):

1 host with 8 cores / 32 GB RAM or
2 hosts with 4 cores / 16 GB RAM each ?

Or the more the better, like:

8 hosts with 1 core / 4 GB RAM each ?

leandrojmp · May 6, 2021, 9:51pm

It really depends on your use case, but there are some things that you could follow.

It is recommended to leave 50% of the memory for the system processes and use other 50% to the JVM Heap size of the elasticsearch process.

With 32 GB of RAM, you can have a Heap size of 16 GB and there is another recommendation that is to have a maximum of 20 shards for each GB of Heap per node, so with 16 GB of Heap you would be able to have something like 320 shards in this node.

So, since elasticsearch will per default balance the number of shards between the nodes, in theory a single node with 32 GB of RAM and 16 GB of Heap is a better choice.

You can read more about it in this blog post that has some recommendations regarding heap and sharding.

Abhishek_Shinde · May 7, 2021, 4:45am

Any recommendations for number of cores as well?

Christian_Dahlqvist · May 7, 2021, 5:46am

It is always recommended to have at least 3 master eligible nodes. I would stick to 3 nodes and make them reasonably large rather than setting up lots of very small nodes. The amount of CPU required will depend on your use case so you need to test. Elasticsearch is IMHO more often limited to storage speed than CPU so you need to find a balance of resources.

Catalin · May 7, 2021, 10:12am

Just so I can get an idea about future optimizations. At this point I have 2 indices:

index A: 1GB, 750.000 documents
index B: 30mb, 350.000 documents

The documents are products and the most expensive operation is running all (7) aggregations on all products for which I get a response in 0.5 seconds. This while it's only me using the server (my local computer). As soon as I start filtering by different aggregations the response time decreases. And for full text search I get the response in under 0.1 seconds.

Are these good starting results ? And should I go with 1 shard per each index and only increase the number of shards in relation to the volume of data ( each extra 20GB as per the documentation ) ? This is my first elasticsearch project that I'm currently looking to deploy to production. Because I plan to increase the indices 10-20 times I'm a bit worried weather I'm on the right path or not and thought I'd ask this as well.

Thank you!

DavidTurner · May 8, 2021, 9:15am

This isn't quite right. It is recommended to leave at least 50% of the memory for the system processes and filesystem cache (and Elasticsearch's non-heap memory needs) and use at most 50% to the JVM Heap size of the Elasticsearch process. There are definitely cases where a smaller heap performs better thanks to the larger filesystem cache.

MCDELTAT · May 13, 2021, 9:02pm

Good points from everyone else thus far. If you're looking to just learn how the general configuration of a cluster is configured and cant spin up multiple hosts (physical or VM) then you can do it in Docker.

Elastic's own docs point out how to do that in Docker Compose. I've done this a few times for testing, but that's it. Install Elasticsearch with Docker | Elasticsearch Guide [7.12] | Elastic

system · June 10, 2021, 9:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I create multiple nodes on the same computer Elasticsearch	4	13637	June 21, 2017
3 Single Node Elastic search to 3 Node cluster Elasticsearch	2	299	March 10, 2022
How to set up multiple clusters Elasticsearch	14	10749	December 29, 2017
Talking nodes on same machine Elasticsearch	2	423	September 7, 2018
Elasticsearch 7.11 with 3 nodes Elasticsearch	7	794	April 9, 2021

Setting up a multi-nodes cluster on a single machine ( 1 elasticsearch.yml file )

Related topics