I am doing a 6 node elasticsearch cluster setup where 4 nodes i am taking as a master and 2 as a data node , but my cluster is not working as expected.
can somebody please help me with complete cluster setup (elasticsearch.yml) configuration.
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
This is the icon to use if you are not using markdown format:
There's a live preview panel for exactly this reasons.
Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.
What is the output of:
GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
If some outputs are too big, please share them on gist.github.com and link them here.
I feel the problem is in elasticsearch.yml configuration , its because if I am stopping elasticsearch service in any of the node or bringing down any of the node still my cluster status is showing "green" for example please find the below link , I hope you got my point now what I am trying to explain kindly share your opinion
I think you misunderstand what it means for an Elasticsearch cluster to be in a "green" state.
For an Elasticsearch cluster to be green all its shards, that is all the data in all the indices you've created plus all the replica shards too, must be available for queries (and for indexing). It doesn't matter how many nodes you have, as long as you have a sufficient number to spread the primary and replica shards (you can't have a replica on the same node where the primary shard is, so you need at least 2 data nodes for a well formed, redundant cluster).
So, when you stop a node your cluster will still turn green because the shards on the stopped node will get replicated to the other nodes in the cluster. This replication may take time, during which the cluster will be in a yellow state, but once all shards are successfully replicated your cluster will be green by definition. The missing nodes have no effect because all the data is available.
The Elasticsearch master node have no memories of former nodes, only of the nodes currently in the cluster, so if you stop one or more nodes the master will simply remove them from its cluster state and won't come looking for them again. From the master's perspective the cluster is OK (green) as soon as all shards from the stopped nodes have been replicated. And that's the only sane definition of green / OK.
yes , I know the functionality of elasticsearch cluster.
so you mean to say my cluster status will still show "green" even 2 or 3 nodes goes down , as all the shards will be allocated across rest of the available nodes.
but I have referred the below link and as per the guideline and my knowledge if multiple node goes down for longer time in that case shards will not be allocated across the rest of the VMs.
feel free to provide your feedback on my understanding on this and if so I believe we both are in same page on this discussion , looking forward your response for the same.
also I request you to go through my current elasticsearch.yml configuration and feedback if any changes is required.
That is correct. If all the shards , primary and replica, can be distributed on the remaining data nodes the cluster will be in a green state once they are all synched and available.
As far as i can see there is no mention of nodes with respect to cluster status in the document you refer to, what that document says about cluster level status is:
The cluster status is controlled by the worst index status.
So the cluster's health is determined by the worst index which again is determined by the worst shard. So if the worst index is missing a replica it is yellow, which means the cluster is in yellow. But if the worst index is missing a primary, which can happen if you stop two data nodes with one of the nodes holding the primary and the other node the replica shard, then that index is in a red state which means the cluster is also in a red state.
Notice that the number of nodes has nothing to do with how the state of the cluster is computed, only the shards. So as long as you keep the shards well distributed your cluster will stay green even if you have stopped one or more nodes
It's hard to give a good answer as I don't know your use cases, your hardware and CPU specs or even which Elasticsearch version you're using. But in general:
3 master eligible nodes are sufficient, as that is the minimum to find quorum during master election. So i would change one of the master eligible nodes to be a data node (node.data; true) and set discovery.zen.minimum_master_nodes: 3 in the elasticsearch.yml files.
I haven't set bootstrap.memory_lock: true in any of my clusters so I have no experience with this setting, but be aware of this warning in the official docs that mlockall might cause the JVM or shell session to exit if it tries to allocate more memory than is available.
The high and low disk watermarks depends on how much disk space you have on your data nodes and on your use cases. For me it's more natural to think percentage than GB, but that's just a matter of preference as I always try to keep the disk usage below 75% on my data nodes, in case I need some extra space for re-indexing or if one of the data nodes falls out of the cluster and its shards get relocated to the other data nodes.
By the way, I think you've got the low, high and flood_stage watermarks backwards - the values should be low < high < flood_stage since the low watermark marks when the data node won't accept more shards, the high watermark marks when the data node will start to move away shards to free space and the flood_stage watermark marks a critically full disk (95% by default) when Elasticsearch will mark indices as read-only to stop more data from being indexed on that node.
For me the default watermark levels, 85% for low, 90% for high and 95% for flood_stage, are usually good enough so I rarely change these in my elasticsearch.yml files.
I hope this helps you configure your cluster. Good luck!
Reading the documentation one more time I see that I was wrong here, if you give the watermark level in GB it's how much disk space you have left. So you're right
But I am slightly puzzled why the watermark settings differ if you specify percentage and GB:
cluster.routing.allocation.disk.watermark.low: 85% kicks in when you have used more than 85% of the disk.
cluster.routing.allocation.disk.watermark.low: "20gb" kicks in when you have less than 20 GB free disk left.
In my head this is very puzzling and makes it hard to switch between percentage and GB.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.