I have 2 powerful servers with 256 GB of memory and ~1 TB of SSD each.
So am gonna have 2 elastic search instances on each server configured to use 31 GB of memory.
Either as a separate process or as Docker containers.
I have to tune my app to high indexing throughput. To that end, I read that
in order to have very performant data nodes, I should use
1 master (no-data) node. Also, is good for having a non-changing IP address.
I am thinking to put the master (no-data) node to a VM (on a different server).
If yes, which would be the specifications ? Is anybody using a similar setup ?
I would like your opinion on
 using Docker containers for Elasticsearch instance VS. Elasticsearch process
 on the setup of a VM to be used as a master (of the cluster) no-data node.
I would appreciate your feedback, as I am preparing for production deployment.
Hi Nikos, we use master/data nodes within docker containers on aws.
First of all, you definitely want more than one master node for master election to work. If your master fails your cluster will stop working. Data nodes cannot become master nodes on their own. I recommend 3 master nodes, we use low end machines(2gigs of ram) for them and don't even send queries their way.
IMHO there isn't much difference between running a process within a docker container vs directly on the host. Just make sure you mount data volumes into the container and don't store anything in the actual container filesystem.
To avoid dealing with IPs we put a loadbalancer in front of our es cluster and added just the data nodes to it.
Just some food for thought. One problem I see with your setup is if you have only two physical servers, no matter how many instances you have on each one, if the host fails they all fail, and I am not sure how well you are going to be able to recover from that without a real majority. Eventually the remaining node will become its own cluster, assuming there is a master instance on it, but I would expect a bit of downtime until that happens.
Thanks @emptyemail for your feedback.
Yes, all three will be master-eligible, but was thinking to put the VM no-data node explicitly as master node. For processing the clustering overhead, and for queries.
If yes, which would be the specifications then ?
Do you notice any delay when mounting the data volume on Docker compared to the process approach ?
I am pretty sure it doesn't work that way. If they are all master eligible then there is no control which is going to be master.
A node that processes queries is very different from a pure master node, you might be thinking of a client node, take a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
I use c3.large aws instance types for master nodes
I've never tried to run client nodes, but the goal would be to have a couple of low powered master nodes for overhead, client nodes to communicate with the outside world and direct data/queries directly to nodes/shard. You will get the most benefit from client nodes if you use routing, in which case the client node will be able to send data/query to just one shard rather than to all.
I've never compared running es in a process vs a mounted volume in docker, but from a linux management point of view, there is almost no overhead in running docker containers and by mounting volumes into the container you are essentially doing a mount -o bind, which is really fast.