Disk Space Utilization on Single Node

Hello ,

I have configured Elasticsearch cluster.
I am using version 2.4.4.
My environment contains one Master Node and two Data Nodes.
I am loading data to my Elasticsearch using Logstash .

Please refer the following configuration which I made in my Logstash.

elasticsearch {
            hosts => ["",""]
            index => "%{[@index_name][es_index]}_%{+YYYY_M}"
            document_type => "%{[@index_type][es_document_type]}"

What I can see is my Disk Space Utilization is always high on a single data node.
I was expecting it to be same on the both of the data nodes, I have configured number_of_replicas as value 1.
I am not able to understand this behavior of my Cluster.

Please refer the following image which shows my Disk Space Utilization.

Color Codes :

Orange : Master Node
Green : Data Node 1
Blue : Data Node 2

You can see that the Data Node 1 (Green Line) is so high than others.

Could any body of you please help me to understand this behavior , so that I correctly configure my environment.

Another things which I am trying to understand is :

  • Whether the data is stored on Master Node also ? If I provide IP of Master Node.

  • How much it will be beneficial to configure a Client Node ?

Thank you


a master only node never stores data. You need to find out (take a look at the cat APIs, especially the shards one), which data is on which node and check if the data is distributed evenly or if you have one big shard that eats up all the data - or if Elasticsearch is not the system eating up all your space, you cant tell by the information provided.


Make sure all nodes in the cluster are running exactly the same version of Elasticsearch. If you have different versions, shards that get created on a node with a newer version can not be replicated over to a node with a lower version. This can lead to the type of imbalances you are seeing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.