Hi, I'm thinking about running two containers:
elasticsearch master/data/ingest/ node -> container A
elasticsearch machine learning -> container B
but they must run on the same server.
I think there might be two approaches:
putting elastic machine learning service on a different port e.g. 9201
merging into one container and running as master/data/ingest/machine learning node.
Regarding point 1, I have no idea how to point the different binaries, the default is:
/usr/share/elasticsearch.
May I use:
/usr/share/elasticsearch1/
/usr/share/elasticsearch2/ ?
I think this question is docker specific and there seems to be a misunderstanding of how to use containers, point 1 sounds correct but the question on binaries is not correct. The docker image contain the containers and the docker image should not be modified
You should simply use two docker containers and can map to a different local port on the docker host (if you even need to map each node, there is no real reason for the ML node, that node does not normally need to be accessible from the docker host).
Modifying the docker image to run two instances of elasticsearch would be unsupported and does not make much sense from a docker standpoint because a container is normally expected to run one process (with PID 1), if that process with PID 1 ends the container terminates
For volume (whether you use named or bind volumes), each docker container has one volume mapping to container directory /usr/share/elasticsearch/data like in our documentation
And of course, this is not elasticsearch specific, any container should also be limited in vCPUs and Memory so there is no over-allocation of resources if you run multiple containers on a docker host (to avoid noisy neighbour issues)
@Julien
So the best and the simplest solution will be make this elastic node as: master/ingest/data/machine learning node and use port 9200 for all of this roles. The only thing is to put lines in the elasticsearch.yml as follows:
I am not sure which version that question is for and best is to check the doc for the version you use, but generally node.ml and xpack.ml.enabled settings both default to true just like the other settings you mentioned (so you could omit all these settings). It you want to run all roles from the node, then yes those settings in elasticsearch.yml passed to the container via environment variables are correct (if you want to separate the roles to have one ml node, you should disable ml for the master-data node and disable all the other roles for the ml node)
You can check with GET _cat/nodes?v to see which roles the node is using (doc for latest version)
The main reason is scalability and high availability. ML and data nodes both use a lot of CPU and memory (ML runs outside the JVM Heap)... So when running everything in the same node, it can lead to performance issue (example ML job making data node slower for ingestion or search when ML uses a lot of CPU)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.