Hi,
I have a cluster with one dedicated master node and two data nodes. I use my master node URL to search my documents. What is the advantage and disadvantage of using dedicated master node?
Also, Is it better to use master node URL or one of my data nodes URL for search calls ?
It's important that the master node runs smoothly - if it spends a lot of time doing non-master tasks (e.g. searches or indexing) then this can harm the performance of the whole cluster. By using a dedicated master node you can keep it free from non-master tasks which normally helps with cluster stability.
The disadvantage is that you need more nodes if you separate them out like this.
It is better not to use the master for searches, because searches can be coordinated on any node.
That said, on small and lightly-loaded clusters you can often get away without using dedicated master nodes.
How do you define lightly-loaded clusters here? I have around 50,000 documents distributed over my two data nodes. Each of its RAM size is 16GB. The document number will increase as time passes. So having a dedicated master node will help my cluster stability right?
It's hard to be precise. If you you find operations involving the master node (e.g. mapping updates, index creation, etc.) are taking too long, or you see warnings about timeouts to do with cluster state updates, then you should consider dedicated master nodes. 50,000 documents is quite a small amount of data, but if you're performing thousands of searches per second then you could still need dedicated masters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.