CPU at 100% with XPACK security Enabled - ES 7.3

I am using elasticsearch 7.3 cluster which consists of
3 Master,
10 Data nodes

The cluster contains almost 100 active index and 10,000 aliases to write on the index.
Cluster load is almost 15-20k bulk ingestion per minute.

Now here is the problem:

  1. I tried with below config:

    xpack.security.enabled: true
    xpack.security.transport.ssl.enabled: true
    xpack.security.transport.ssl.verification_mode: full
    xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.enabled: true
    xpack.security.http.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12

    CPU uses was at peaks

  2. Second config I tried is:
    changed to
    xpack.security.transport.ssl.verification_mode: certificate

    CPU was still at peaks

  3. Third config I tried is:
    changed to
    xpack.security.transport.ssl.verification_mode: none

    CPU was still at peaks

  4. Fourth I tried with removing complete transport layer
    that is with the below config only

    xpack.security.enabled: true
    xpack.security.http.ssl.enabled: true
    xpack.security.http.ssl.keystore.path: /etc/elasticsearch/ssl/cert.p12
    xpack.security.http.ssl.truststore.path: /etc/elasticsearch/ssl/cert.p12

    Still, CPU was at peaks

  5. I removed complete xpack security and CPU usage scaled down from 100% to less than 10-20%.

I checked hot threads during all those above configurations and all the time CPU consumption was by http_server_worker and transport_worker

Attaching screenshots of graphs for CPU uses Hot threads results as well.

It seems some bug is there with xpack Auth system which takes a lot of CPU.

Hot thread result-2: It's just part result see link for full result https://slack-files.com/T02FYRSTM-FRSDW5TCN-51434a66f4
Hot threads at 2019-12-15T13:52:24.340Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

    java.base@12.0.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
       java.base@12.0.1/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
       app//org.elasticsearch.bootstrap.Bootstrap$1.run(Bootstrap.java:83)
       java.base@12.0.1/java.lang.Thread.run(Thread.java:835)

    0.0% (0s out of 500ms) cpu usage by thread 'Common-Cleaner'
     10/10 snapshots sharing following 5 elements
       java.base@12.0.1/java.lang.Object.wait(Native Method)
       java.base@12.0.1/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
       java.base@12.0.1/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
       java.base@12.0.1/java.lang.Thread.run(Thread.java:835)
       java.base@12.0.1/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)

::: {datapoints_default_cs1-r52xl-data3}{KjC3JfvLS7WDN_u-VckRVw}{bbC4KJecRruQeyoHP6pu3Q}{172.31.32.90}{172.31.32.90:9300}{di}{rack_id=dpp, xpack.installed=true}
   Hot threads at 2019-12-15T13:52:24.342Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

   88.3% (441.3ms out of 500ms) cpu usage by thread 'elasticsearch[datapoints_default_cs1-r52xl-data3][http_server_worker][T#7]'
     9/10 snapshots sharing following 285 elements

CPU usage high is when using xpack security enabled any mode, and low is without xpack security.

      app//org.apache.lucene.util.automaton.CharacterRunAutomaton.run(CharacterRunAutomaton.java:48)
       org.elasticsearch.xpack.core.security.support.Automatons$1.test(Automatons.java:219)
       org.elasticsearch.xpack.core.security.support.Automatons$1.test(Automatons.java:216)
       org.elasticsearch.xpack.security.authz.RBACEngine.resolveAuthorizedIndicesFromRole(RBACEngine.java:475)
       org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:312)

The root of the issue appears to be related to the roles your users have, and resolving their index patterns against the indices and aliases in your cluster.

Given

I suspect you are doing something unusual here.
Can you provide more detail of your roles and the aliases you are using?

Hi, @TimV thanks for looking into this,
I am using a built-in elastic user only.
and for index and aliases, basically, our indexes are rotational on a weekly basis, index pattern is something like client_name_45m_201952, client_name2_48m_201952 etc.
and aliases are like these
client_name_45m_201952: {w_alias_name:{u'filter': {u'term': {u'primaryName': u'value'}}}}

What I have mentioned 100 index and 10,000 aliases is correct number only, and we rollover indexes weekly.

Anyone can explain in detail about ACL?
seems issue is because of ACL.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.