Elastic machine learning on a different node but same server

Hi, I'm thinking about running two containers:
elasticsearch master/data/ingest/ node -> container A
elasticsearch machine learning -> container B
but they must run on the same server.

I think there might be two approaches:

  1. putting elastic machine learning service on a different port e.g. 9201
  2. merging into one container and running as master/data/ingest/machine learning node.

Regarding point 1, I have no idea how to point the different binaries, the default is:
May I use:
/usr/share/elasticsearch2/ ?

Which approach will be better and why ?

Best regards,

I think this question is docker specific and there seems to be a misunderstanding of how to use containers, point 1 sounds correct but the question on binaries is not correct. The docker image contain the containers and the docker image should not be modified
You should simply use two docker containers and can map to a different local port on the docker host (if you even need to map each node, there is no real reason for the ML node, that node does not normally need to be accessible from the docker host).
Modifying the docker image to run two instances of elasticsearch would be unsupported and does not make much sense from a docker standpoint because a container is normally expected to run one process (with PID 1), if that process with PID 1 ends the container terminates

For volume (whether you use named or bind volumes), each docker container has one volume mapping to container directory /usr/share/elasticsearch/data like in our documentation

And of course, this is not elasticsearch specific, any container should also be limited in vCPUs and Memory so there is no over-allocation of resources if you run multiple containers on a docker host (to avoid noisy neighbour issues)


So the best and the simplest solution will be make this elastic node as: master/ingest/data/machine learning node and use port 9200 for all of this roles. The only thing is to put lines in the elasticsearch.yml as follows:

node.master: true
node.data: true
node.ingest: true
node.ml: true
xpack.ml.enabled: true

Is that correct ?

I am not sure which version that question is for and best is to check the doc for the version you use, but generally node.ml and xpack.ml.enabled settings both default to true just like the other settings you mentioned (so you could omit all these settings). It you want to run all roles from the node, then yes those settings in elasticsearch.yml passed to the container via environment variables are correct (if you want to separate the roles to have one ml node, you should disable ml for the master-data node and disable all the other roles for the ml node)
You can check with GET _cat/nodes?v to see which roles the node is using (doc for latest version)

Thanks man. Is there any benefit from running machine learning as standalone node ?

The main reason is scalability and high availability. ML and data nodes both use a lot of CPU and memory (ML runs outside the JVM Heap)... So when running everything in the same node, it can lead to performance issue (example ML job making data node slower for ingestion or search when ML uses a lot of CPU)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.