CPU at 100% with XPACK security Enabled - ES 7.3

I am using elasticsearch 7.3 cluster which consists of
3 Master,
10 Data nodes

The cluster contains almost 100 active index and 10,000 aliases to write on the index.
Cluster load is almost 15-20k bulk ingestion per minute.

Now here is the problem:

  1. I tried with below config:

    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true
    xpack.security.transport.ssl.verification_mode: full
    xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.enabled: true
    xpack.security.http.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12

    CPU uses was at peaks

  2. Second config I tried is:
    changed to
    xpack.security.transport.ssl.verification_mode: certificate

    CPU was still at peaks

  3. Third config I tried is:
    changed to
    xpack.security.transport.ssl.verification_mode: none

    CPU was still at peaks

  4. Fourth I tried with removing complete transport layer
    that is with the below config only

    xpack.security.enabled: true
    xpack.security.http.ssl.enabled: true
    xpack.security.http.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12

    Still, CPU was at peaks

  5. I removed complete xpack security and CPU usage scaled down from 100% to less than 10-20%.

I checked hot threads during all those above configurations and all the time CPU consumption was by http_server_worker and transport_worker

Attaching screenshots of graphs for CPU uses Hot threads results as well.

It seems some bug is there with xpack Auth system which takes a lot of CPU.

Hot thread result-2: It's just part result see link for full result https://slack-files.com/T02FYRSTM-FRSDW5TCN-51434a66f4
Hot threads at 2019-12-15T13:52:24.340Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

    java.base@12.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
       java.base@12.0.1/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
       app//org.elasticsearch.bootstrap.Bootstrap$1.run(Bootstrap.java:83)
       java.base@12.0.1/java.lang.Thread.run(Thread.java:835)

    0.0% (0s out of 500ms) cpu usage by thread 'Common-Cleaner'
     10/10 snapshots sharing following 5 elements
       java.base@12.0.1/java.lang.Object.wait(Native Method)
       java.base@12.0.1/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
       java.base@12.0.1/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
       java.base@12.0.1/java.lang.Thread.run(Thread.java:835)
       java.base@12.0.1/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)

::: {datapoints_default_cs1-r52xl-data3}{KjC3JfvLS7WDN_u-VckRVw}{bbC4KJecRruQeyoHP6pu3Q}{172.31.32.90}{172.31.32.90:9300}{di}{rack_id=dpp, xpack.installed=true}
   Hot threads at 2019-12-15T13:52:24.342Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

   88.3% (441.3ms out of 500ms) cpu usage by thread 'elasticsearch[datapoints_default_cs1-r52xl-data3][http_server_worker][T#7]'
     9/10 snapshots sharing following 285 elements

CPU usage high is when using xpack security enabled any mode, and low is without xpack security.

      app//org.apache.lucene.util.automaton.CharacterRunAutomaton.run(CharacterRunAutomaton.java:48)
       org.elasticsearch.xpack.core.security.support.Automatons$1.test(Automatons.java:219)
       org.elasticsearch.xpack.core.security.support.Automatons$1.test(Automatons.java:216)
       org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizedIndicesFromRole(RBACEngine.java:475)
       org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312)

The root of the issue appears to be related to the roles your users have, and resolving their index patterns against the indices and aliases in your cluster.

Given

I suspect you are doing something unusual here.
Can you provide more detail of your roles and the aliases you are using?

Hi, @TimV thanks for looking into this,
I am using a built-in elastic user only.
and for index and aliases, basically, our indexes are rotational on a weekly basis, index pattern is something like client_name_45m_201952, client_name2_48m_201952 etc.
and aliases are like these
client_name_45m_201952: {w_alias_name:{u'filter': {u'term': {u'primaryName': u'value'}}}}

What I have mentioned 100 index and 10,000 aliases is correct number only, and we rollover indexes weekly.