Index balance in the cluster

AClerk · July 15, 2020, 11:54pm

Hello All,
I have noticed that I have 1 server out of 5 in the cluster, that stores the majority of the data/indices.
Also, one of the nodes holds a lower number of shards.

How can I balance the indices across the nodes?
How can I balance the shards across the nodes?

This is the status of the cluster

NODE	INDEX#	SHARDS#
01	571	455
02	571	457
03	288	400
04	571	449
05	571	446

Thanks

warkolm · July 15, 2020, 11:57pm

What does disk usage look like on your nodes?

AClerk · July 16, 2020, 12:10am

Disk usage for data directory is

Node	Size of data/index
1	271G
2	257G
3	140G
4	277G
5	262G

So looks like node 3 is used less by elastic.
I expected it to be evenly split between nodes. So each node will have 240G of data

Steve_Mushero · July 16, 2020, 4:31am

What is the history, i.e. were there more indexes before, or perhaps unbalanced when that node was down and some indexes created, etc.? Are all nodes the same in terms of disk/ RAM/JVM, etc.? And what is typical index setting for shards/replicas?

Steve_Mushero · July 16, 2020, 4:36am

And ANY routing going on, for HA in a cloud (allocation awareness), etc.? Any playing with all those settings now or in the past?

Do you often create new indexes (like daily) and close, purge them? Any closed indexes (which won't show in many lists, but use space)?

Shards are balanced (not indexes) via disk space and other factors - also some settings for this (like cluster.routing.allocation.balance.shard which I've not played with but would seem to force re-balancing):
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html

AClerk · July 16, 2020, 9:21am

Hello @Steve_Mushero
Thanks for the reply.
Here are the answers for your questions:

No serious history in the past. There were less indices in the past. the cluster has grown a bit in the past couple of weeks.
All VMs are identical.
Typical index setting is 1 primary and 1 replica shard. Some will have 2 primary and 2 replica.
Indices are created hourly, daily, and weekly.
Some indices are purged hourly, some weekly.
Most setting are elastics' default. No changes to shards settings.

About closed index that won't show but use space.
How can I know know? How can I find if there are any such cases?

Thanks!

Vinayak_Sapre · July 17, 2020, 5:57am

@AClerk
I am assuming cluster rebalance is not blocked and node3 disk does not have anything other than ES data. The later is important as available disk space matters not the size of the disk. You can verify this using node stats API _nodes/<node_id>/stats/fs?pretty or _nodes/stats/fs?pretty.

Unless you are experiencing performance issue I won't worry about it. In my experience, unless you hit disk high watermark or certain node is overloaded ES doesn't move shards. Because moving a shard is an expensive operation, consumes network bandwidth, incurs gc and loses caches. When I new shard is allocated it takes available disk space into account. But ES cannot predict how big this shard going to grow in size. It treats all shards equally. Some shards may grow faster but won't be moved unless you hit watermark.

If you are experiencing performance issue, you need to first decide which shard to move. This depends on your query volume on each index and shard size. You can use then use cluster route api to move specific shard(s).

For a long term solution, I would explore something on the lines of ILM since you are using time based shards. Your cluster size is small. So you need to assess based on query volume on the old data.

Finally for some reason you want all nodes to have roughly equal utilization, compute (total data size on all nodes * 100) / (5 * disk size). The set low and high watermarks slightly higher than that. This will force ES to rebalance. Once rebalancing completes you can set those back to default. Make sure you create hourly indices for several hours and may be even daily for a day or two before changing watermark. I won't recommend this as I don't see major benefits but it can go wrong badly if not done correctly. Including option only for information.

AClerk · July 17, 2020, 6:35am

Hi @Vinayak_Sapre
thanks for your reply.
I do have performance issues and I am truing to find the root cause.
It is not necessarily because of nodes balance.

I have more than 2TB available on the disks for each node. So I think this is not an issue.

Thnaks!

Vinayak_Sapre · July 17, 2020, 6:58am

@AClerk

I would analyze slow query log and look at shard size / distribution of those indices. Also look at queries written correctly.

Are you seeing significantly different CPU / IO utilization on this node?

Christian_Dahlqvist · July 17, 2020, 7:08am

Based on the different time periods covered by the indices it sounds like you could be having indices of very different sizes. How large is your largest index? How many indices around that size do you have? How large is your smallest index?

AClerk · July 20, 2020, 12:15am

@Christian_Dahlqvist
Size by doc count? Store size?
I might have indices of different sizes. How is that affecting the cluster?

@Vinayak_Sapre
I am analysing slowlogs.
Still trying to understand if that node is acting differently and how.

Steve_Mushero · July 20, 2020, 6:55am

Slow won't affect where shards go - we have a new visual view we are working on for the cluster and by index to see where things are going; couple other tools have a bit of that, but there must be some reason over time, especially if you are creating new indexes all the time.

By size, I'm pretty sure Christian mans in types/store size; doc counts don't matter as they can be 10B or 100MB each. Size might affect where things go, especially if huge, I guess.

Christian_Dahlqvist · July 20, 2020, 7:04am

I am wondering whether you may get have a few very large shards that would skew the balance.

system · August 17, 2020, 7:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Three Node Elastic Cluster balance issue Elasticsearch	7	219	December 8, 2022
Shard balancing question Elasticsearch	1	292	July 6, 2017
Elasticsearch Cluster - Difference is storage usage between nodes Elasticsearch	8	1782	January 5, 2021
Balancing disk usage on large clusters? Elasticsearch	3	1536	September 24, 2020
Re-balancing shard allocation Elasticsearch	21	811	June 20, 2018

Index balance in the cluster

Related topics