I am new to Elastic and Machine Learning but what I have gathered by documentation is that to Use machine learning ( For the production cluster) it is better to use a dedicated machine learning node. Now in case of my project to dedicate a separate node as a machine learning node it may require a separate hardware and our content storage capacity may take a hit. When I read the docs I got the idea that I can set my ML node to be a data node as well. Now I know that a dedicated master or a dedicated coordinator should not be set as ML node. We have dedicated data nodes (hot,warm and cold). So is my thought process correct that my dedicated ML node can also be used a data node.
Any help or guidance is appreciated.
An ML node can also be a data node. It can also be a master node or a coordinating node too. In fact, if you run a single node cluster on your laptop for testing then it will have all these roles.
The reason we recommend not making joint ML/data nodes for serious production use relates to the memory on the machine outside of the JVM heap. ML nodes use memory outside of the JVM heap for running native analytics processes. Data nodes rely on memory outside of the JVM heap to be used as a file system cache by the operating system - without this OS level disk caching search performance will be poor. So, if you have a joint ML/data node then they'll be fighting over that memory outside of the JVM heap.
Whether it matters depends on whether you need optimum performance. If you are not pushing your cluster very hard and would rather save money by running fewer nodes then that's a decision you can make.
Thank you for clearing my query appreciate it. Will keep these things in mind and will proceed accordingly.