I have deployed in Kubernetes a Statefulset that creates 3 nodes of Elasticsearch of the master+data type, sharing the same volume. Then I saw this warning.
Never run different node types (i.e. master, data) from the same data directory. This can lead to unexpected data loss.
Is there a risk if all nodes are of the same type (master and data)? Because the warning can be interpreted that the nodes cannot be of different type, ie there could not be 2 master nodes and 3 data nodes (master or data), but it could be that there is no risk having all nodes equal, master+data.
Huh, I've never seen that warning before, and it's not clear why it's there. It is of course perfectly fine to set node.master: true and node.data: true on a node; indeed this is the default configuration.
However, as a general rule you should avoid setting node.max_local_storage_nodes in production, and instead give each node a different data path.
Ok, I see. If you have multiple nodes running in the same data path then they all choose different subfolders in which to keep their data, and there's no guarantee that they will use the same subfolder after a restart. If a master-only node starts to run on a subfolder that previously belonged to a data-only node then it will ignore the index data, but then after a restart might be assigned to a data node again and possibly re-import some stale index data.
Avoid node.max_local_storage_nodes and you should be ok.
In the case that the nodes have the same roles (i.e. node.master: true and node.data: true ), if the nodes were restarted, would there still be a risk that each node would go to a directory that previously did not correspond to it?
I ask because of what you mentioned earlier:
no guarantee that they will use the same subfolder after a restart
I understand that if nodes have the same roles, two things can happen when nodes restart:
A) Each node can go to a subfolder that was not the one it was using, but having the same roles is not a risk.
B) Each node uses the subfolder it was previously using.
Yes, but this shouldn't matter. It means that the correspondence between node.name (in the elasticsearch.yml file) and the node ID (stored in the data path) is messed up, but Elasticsearch is supposed to cope with this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.