In our company we run a small Elasticsearch cluster with meanwhile 6 data nodes. These have been added over time and are therefore equipped quite differently - especially in terms of disk capacity.
Now we are planning to renew the cluster and my first question is: Is it still correct that Elasticsearch assumes that all Data Nodes in the cluster have the same amount of disk space? I know that this used to be the case. I have researched on the topic, but most of the posts were a bit older. So I wonder if this is still recommended or even required.
Furthermore I would like to know if it is possible to run multiple (Data Node) instances per server or if there are things that absolutely speak against it. Of course you would need a lot of RAM and CPU and each instance would have its own disks but still they would have to share for example the network interfaces. So what are the advantages and disadvantages of running multiple instances on one machine.
I don't believe ES assumes. It has the actual data. You could have a cluster with different instance types and storage sizes.
From my own experience, I just keep all data nodes the same. It's a lot easier to manage.
There might be a legitimate reason you need them to be different (maybe super large cluster?)
My personal strategy is to have uniform data nodes, period.
You could run multiple instances of ES on a single server, but why? It's a waste of resource. Unless you are a data center hosting for multiple clients, I don't think it's practical. You could run your cluster as docker containers and decouple the node and server number, but the waste can't be justified for the "convenience". I am often trying to figure out a way to squeeze more out of my cluster. Avoiding waste from the get-go would be a wise decision.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.