Aliase creation timeout, when node leaves cluster - ES 2.4.6

(jigish thakar) #1

Hi All,
We are running ES 2.4.6 (very old version! I know) cluster with 10 data nodes - i3.2xlarge each.
And UI build for this cluster, creates alias for a session with provided search criteria.

Now issue is, when we do intentional node reboot, we stop allocation.
But during that time, if user logs in to portal. Then UI tries to create aliase.. and ES takes either too long.. or it just throws timeout.

I get error like below on elected master node.

2019-01-12 09:25:06,724][DEBUG][action.admin.indices.alias] [sng-prod-es2-master1] failed to perform aliases
ProcessClusterEventTimeoutException[failed to process cluster event (index-aliases) within 30s]
        at org.elasticsearch.cluster.service.InternalClusterService$2$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$

Thanks in advanced.

(Christian Dahlqvist) #2

How many indices, shards and aliases do you have in the cluster?

(jigish thakar) #3

Shards: 24604
Indices: 6126
Aliases: 4623

we are heavily dependent on aliases.

(Christian Dahlqvist) #4

It looks like you have far too many indices and shards for a cluster that size. All this takes up space in the cluster state and makes updates slower. This also applies to aliases, which are also stored in the cluster state. I would therefore recommend reducing this significantly.

What are you using such a large number of aliases for? can you describe the use-case?

(jigish thakar) #5

its a logging platform, we have huge number of customers.. per customer per day we maintain indices. and these indices are accessed using predefined aliases.

also our cluster is large too, so I don't think size is an issue.
and this timeout only happens when huge number of shards are being relocated.
which happens during rolling reboot.
and once the cluster is Green things becomes normal, though shard re balancing is going on.