License validation seems to take for very long

jmartori · June 17, 2019, 5:47pm

Hi everyone,
we are currently running a cluster 5.6 with 5 nodes, 4 mdi and 1 coordinator, and a valid xpack license.
When we restart a node it takes forever (about 8minutes) until the license is validated and node is allowed to find the master. This raises another problem, Authentication to realm ldap1 failed due to invalid credentials.

[2019-06-17T19:47:25,999][WARN ][o.e.x.s.a.AuthenticationService] [worker2] Authentication to realm ldap1 failed - authenticate failed (Caused by LDAPException(resultCode=49 (invalid credentials), errorMessage='invalid credentials'))
[2019-06-17T19:49:27,311][INFO ][o.e.l.LicenseService     ] [worker2] license [license ID] mode [platinum] - valid
[2019-06-17T19:49:27,344][WARN ][o.e.c.s.ClusterService   ] [worker2] cluster state update task [zen-disco-receive(from master [master {worker3}{ID}{ID}{IP}{IP}{ml.max_open_jobs=10, ml.enabled=true} committed version [7661]])] took [8.1m] above the warn threshold of 30s

Is there a reason for that?
If more information about our setup is need feel free to ask and I'll provide as much as I can.

Thank you very much,

warkolm · June 18, 2019, 2:26am

Given you have a platinum level license you should reach out to your Support team for assistance here

jmartori · June 18, 2019, 1:01pm

Hi Mark,
I tried to get in touch with support, but apparently we have a weird (startup) license, that comes with no support. I guess we get all the perks from the software but none of the elasticsearch expertise!

Anyway, if you have any idea about where I should start looking to find out why it takes so long to validate the license that would be great.

I tried on an aws cluster with the non-production license, and it always goes quite fast to validate. However those machines are not very busy.

Thanks,

dadoonet · June 18, 2019, 1:25pm

@TimV might have an idea?

TimV · June 19, 2019, 4:22am

This description seems to imply a causation that is the reverse of reality.
The license is stored in the cluster state, so a newly started node cannot activate the license until it connects to the master and receives an up to date copy of license state.

That is, your cluster formation isn't being delayed due to license checks, it's that the license checks are delayed because cluster formation is slow.

This could be caused by a variety of issues, but my guess is that this is because your native realm is unavailable when the node is disconnected from the cluster (because the security index is not available), so authentication requests that should be handled by the native realm are falling through to your LDAP realm, and failing.

TimV · June 19, 2019, 4:25am

jmartori:

[2019-06-17T19:49:27,344][WARN ][o.e.c.s.ClusterService ] [worker2]
cluster state update task [zen-disco-receive(from master [master {worker3}{ID}{ID}{IP}{IP}{ml.max_open_jobs=10, ml.enabled=true} committed version [7661]])]
took [8.1m] above the warn threshold of 30s

This is the issue that we really need to solve, but it doesn't look like we have much information to go on.

What sort of network do you have between the nodes in your cluster?

Ping @DavidTurner in case he has some suggestions (but bear in mind that 5.6 is getting old).

Christian_Dahlqvist · June 19, 2019, 4:57am

Given that cluster updates seem slow I would first look at how many indices and shards you have in the cluster. Having too many shards often leads to this type of problems. How many shards do you have in your cluster? What is the hardware specification of the nodes in the cluster?

warkolm · June 19, 2019, 5:54am

5.6 is EOL - https://www.elastic.co/support/eol

DavidTurner · June 19, 2019, 6:41am

Hey, yes, ≥ 8 minutes to apply a cluster state looks like the problem here. I would like you to set the following setting in elasticsearch.yml on the problematic node:

logger.org.elasticsearch.cluster.service: TRACE

Then restart the node, wait for it to join the cluster, and then provide all the logs that it emitted since it started up. If you need to redact any information then please make it obvious that it's been redacted. It'd be useful if you didn't redact the node IDs as you did in the OP. These IDs are randomly-generated and contain no information except their identity.

I also echo @Christian_Dahlqvist's questions about the number of indices and shards in this cluster, and @warkolm's point that 5.6 is past the end of its supported life and you should be working towards upgrading to a supported version as a matter of some urgency.

jmartori · June 19, 2019, 12:30pm

We have 5 nodes in 4 different machines. Each node elastic node is a mdi except for the coordinating node. Each machine 128GB of RAM, 32GB for heap for mdi and 4 for the coordinating. In general each machine has 30 GB of free RAM. On the CPU side, all machines have 32 CPUs.
The network between between the nodes is 10Gb (fiber?).

We have a total of 3220 indices and 5240 shards, but of those 3220, 1800 are closed and 500 are special.

We added the coordinating node, as a temporary solution while we where enabling xpack security.

We have started planning how to migrate to 7.1.0, but it will take us some months to get there.

system · July 17, 2019, 12:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Timeout when registering license Elasticsearch	4	1523	August 25, 2017
LDAP authentication error Elasticsearch elastic-stack-security	7	725	October 26, 2020
Read timeout warning when Follow referrals is true Elasticsearch	10	3372	April 11, 2018
Elasticsearch 7.13 "Cannot Find License/Not a Valid License" Elasticsearch license	3	594	July 7, 2021
Running X-Pack on Cluster in an Enclave Elasticsearch	8	1667	August 3, 2017

License validation seems to take for very long

Related topics