Unbalanced disk usage - ES 2.1

varun_arora · May 9, 2018, 7:03pm

On a 5 node cluster, the shards are balanced, but the disk space used is unbalanced. Checking the directories, I found that the shards are taking around 10 times the size of the shard on the disk.

//tickets_v2 2 p STARTED 1688677 30.5gb 10.14.23.210 Occulus
tickets_v2 2 r STARTED 1688677 28.9gb 10.14.66.191 Lila Cheney
tickets_v2 4 p STARTED 1690046 34.6gb 10.14.66.191 Lila Cheney
tickets_v2 4 r STARTED 1690046 30.1gb 10.14.17.10 Equinox
tickets_v2 1 r STARTED 1687292 26.9gb 10.14.23.210 Occulus
tickets_v2 1 p STARTED 1687292 30.1gb 10.14.23.34 Cowgirl
tickets_v2 3 p STARTED 1688535 31.3gb 10.14.65.216 Mastermind
tickets_v2 3 r STARTED 1688535 27.9gb 10.14.17.10 Equinox
tickets_v2 0 r STARTED 1688199 28.6gb 10.14.65.216 Mastermind
tickets_v2 0 p STARTED 1688198 30.2gb 10.14.23.34 Cowgirl

root@elasticsearch2:/data/elasticsearch/touch/nodes/0/indices/tickets_v2# du -sh *
299G 3
300G 4
8.0K _state

On the other node it is -

root@elasticsearch1:/data/elasticsearch/touch/nodes/0/indices/tickets_v2# du -sh *
28G 1
302G 2
8.0K _state
Why would this be? and any solutions to this ?

warkolm · May 10, 2018, 6:08am

Elasticsearch only balances by shard count, not size.

Christian_Dahlqvist · May 10, 2018, 6:28am

That looks like a big difference. Are all nodes running exactly the same version?

varun_arora · May 10, 2018, 9:06am

Yes.. all are running the same version - 2.1.1

Christian_Dahlqvist · May 10, 2018, 9:21am

Based on the shard listing it looks like data is evenly distributed across the nodes as each node have 2 shards that are all similar in size.

varun_arora · May 10, 2018, 10:51am

Yes, would adding nodes and relocating them to the new one help ? or is it jus the movement of shards from the disk to the other node ?

Any other way you could see to alleviate this ?

Could it be something in 2.x? We have not seen this issue with 5.x

varun_arora · May 10, 2018, 11:13am

Would the merge api help here ?

What I see in newrelic plugin is that number of documents is same on all boxes.

Christian_Dahlqvist · May 10, 2018, 11:15am

What does GET /_nodes/stats/indices give?

varun_arora · May 10, 2018, 11:27am

Any place where I can paste the output ? Its too long

Christian_Dahlqvist · May 10, 2018, 11:45am

Put it in a gist and link to it here.

varun_arora · May 10, 2018, 11:53am

Here it is :

gist.github.com

https://gist.github.com/varunarora123/4d5d811e7d43f7a30564661499bf8c94

gistfile1.txt

root@elasticsearch1:~# curl localhost:9200/_nodes/stats/indices?pretty
{
  "cluster_name" : "touch",
  "nodes" : {
    "Fw7BYODAQ66rkMRHxRJx8w" : {
      "timestamp" : 1525951332993,
      "name" : "Occulus",
      "transport_address" : "xxx.xxx.xxx.xxx:9300",
      "host" : "xxx.xxx.xxx.xxx",
      "ip" : [ "xxx.xxx.xxx.xxx", "NONE" ],

This file has been truncated. show original

Christian_Dahlqvist · May 10, 2018, 12:00pm

Based on that the data seems reasonably evenly distributed across the nodes.

varun_arora · May 10, 2018, 12:13pm

Any suggestions to alleviate this? New node addition ? relocation of shards? or merge api ?

I am running out of ideas.

Christian_Dahlqvist · May 10, 2018, 12:33pm

I do not see what the problem is. Distribution seems even across the nodes.

varun_arora · May 10, 2018, 2:01pm

Any other clue why the disk usage would be bloated for any shard ? Do you think its an issue related to 2.x ?

Christian_Dahlqvist · May 10, 2018, 2:20pm

I do not know as I have not used version 2.x in quite some time. It may help if you can identify which types of files that make up the difference.

Wedney_Yuri · May 10, 2018, 2:23pm

Hi @Christian_Dahlqvist,

I'm experiencing a similar issue in elasticsearch 6.2.3. The shard size is not equal across nodes.

The primary node shard is using 57.8gb of storage while the replicas are using 264.5gb.

Wedney_Yuri · May 10, 2018, 2:28pm

In the graph below you can see the difference between the master and the replicas. This index contains only one shard.

elasticsearch.yml:

cluster.name: ${CLUSTER_NAME}
cluster.routing.allocation.awareness.attributes: aws_availability_zone
cloud.node.auto_attributes: true
plugin.mandatory: discovery-ec2,repository-s3
transport.tcp.compress: true
indices.queries.cache.size: 30%
indices.requests.cache.size: 20%
indices.memory.index_buffer_size: 20%
indices.memory.max_index_buffer_size: 512mb
action.auto_create_index: false
action.destructive_requires_name: true
node.master: ${ES_NODE_MASTER}
node.data: ${ES_NODE_DATA}
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.cors.enabled: true
http.cors.allow-origin: '*'
discovery.zen.minimum_master_nodes: ${SPLIT_BRAIN_NODES}
discovery.ec2.tag.cluster: ${CLUSTER_NAME}
discovery.ec2.endpoint: ec2.${AWS_REGION}.amazonaws.com
discovery.zen.ping_timeout: 30s
discovery.zen.hosts_provider: ec2

varun_arora · May 10, 2018, 4:19pm

In my case, even that is same :

varun_arora · May 10, 2018, 6:22pm

I found the issue affecting me. Its the translog that is not being flushed.

root@elasticsearch1:/data/elasticsearch/touch/nodes/0/indices/tickets_v2/2# du -sh *
33G index
4.0K _state
282G translog

Related to this bug : https://github.com/elastic/elasticsearch/pull/15830

@Christian_Dahlqvist : Shall I flush it with (POST /tickets_v2/_flush) ? What will be its effect on the application? Would the other indices continue to serve ?

Topic		Replies	Views
Unbalanced disk usage with ES 6.1.3 Elasticsearch	4	2585	May 1, 2018
Disk usage not banalced Elasticsearch	2	368	July 6, 2017
Uneven Shard Distribution Elasticsearch	2	2103	January 18, 2018
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5422	December 3, 2018
Uneven data distribution across path.data - ELK 2.1 Elasticsearch	1	500	July 5, 2017

Unbalanced disk usage - ES 2.1

Related topics