Skewed shards on index even with enough disk space in all physical nodes

viniciof · October 23, 2019, 10:55pm

Hi everyone,

I have an index (shown below) which is not evenly distributed accross my 10 data nodes cluster.

elastic2

Each node in the cluster has at least 1.1 TB free, only the one called arm-lc-004 has ~700GB free only but it's also quite enough.

Where should I look at to try to diagnose the root cause behind these skewed shards ?

DavidTurner · October 24, 2019, 7:07am

Can you share the output of GET _cat/shards? Use https://gist.github.com/ if it doesn't fit here.

viniciof · October 24, 2019, 3:14pm

Hi Dave,

Output is here

Appreciate any inputs,

Regards,
Vinicio

DavidTurner · October 24, 2019, 3:48pm

It looks like Elasticsearch is aware of the imbalance and is moving shards to address it. This rebalancing process takes time.

$ grep es-shards.txt -e RELOCATING
daas-arm-prod-users-2019-10                                            4  r RELOCATING 13259050   59.4gb 10.187.72.6  arm-lc-004_data -> 10.187.72.4 Be-sYmy7TJquJWCAmZ2aSA arm-lc-002_data
daas-arm-prod-users-2019-09                                            3  p RELOCATING 16503328   69.7gb 10.187.72.6  arm-lc-004_data -> 10.187.72.4 Be-sYmy7TJquJWCAmZ2aSA arm-lc-002_data

Some of your indices have far too many shards that are far too small. E.g. daas-arm-int-dataflow-2019-10 has 10 shards all smaller than 70MB. Indices like this should have one shard.

You have some excessively tiny daily indices too, e.g. kafka-metrics-* and jmx-*. These would be better as one-shard monthly indices.

Other shards look to be time-based but are surprisingly large. E.g. daas-arm-prod-users-2019-10 has 20 shards all in the region of 60GB. 60GB shards are fine, but why put 20 of them into one index? Would you be able to have fewer shards at once and use rollover to start a new index when the old one gets too big?

system · November 21, 2019, 3:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.