Using ECK, Elastic search does not start up after enabling SAML

We are trying to enable SAML on our ELK stack on Kubernetes. We are using ECK and custom resource definitions to manage and run our Elastic cluster.

Instructions followed: Set up SAML with Azure Active Directory | Elasticsearch Service Documentation | Elastic

Our cluster setup:

  • 3x master nodes
  • 2x hot data nodes
  • 2x warm data nodes
  • 2x content data nodes
  • 1x miscellaneous node for all other roles not excplicitly used, as well as 3rd content store

I'm 90% convinced we missed a config, but have been unable to find it so far. We first had these xpack entries added to Kibana, which failed. Then we moved it to a Kubernetes secret and passed it through as a secureSetting for Elasticsearch, but the error on startup indicated it should be part of normal config.

Error Output:

"message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"elk-es-master-1-0","elasticsearch.cluster.name":"elk",
"error.type":"java.lang.IllegalStateException","error.message":"security initialization failed",
"error.stack_trace":"java.lang.IllegalStateException: security initialization failed
    org.elasticsearch.server@8.9.1/org.elasticsearch.node.Node.lambda$new$16(Node.java:733)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.Security.createComponents(Security.java:641)
    org.elasticsearch.server@8.9.1/org.elasticsearch.plugins.PluginsService.lambda$flatMap$1(PluginsService.java:261)
    java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
    java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
    java.base/java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:722)
    java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
    java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
    java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
    java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622)
    java.base/java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627)
    org.elasticsearch.server@8.9.1/org.elasticsearch.node.Node.<init>(Node.java:748)
    org.elasticsearch.server@8.9.1/org.elasticsearch.node.Node.<init>(Node.java:334)
    org.elasticsearch.server@8.9.1/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:234)
    org.elasticsearch.server@8.9.1/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:234)
    org.elasticsearch.server@8.9.1/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:72)\nCaused by: java.lang.IllegalStateException: SAML requires that the token service be enabled (xpack.security.authc.token.enabled)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.authc.saml.SamlRealm.create(SamlRealm.java:204)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.authc.InternalRealms.lambda$getFactories$5(InternalRealms.java:162)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.authc.Realms.initRealms(Realms.java:287)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.authc.Realms.<init>(Realms.java:108)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.Security.createComponents(Security.java:751)
    org.elasticsearch.security@8.9.1/org.elasticsearch.xpack.security.Security.createComponents(Security.java:629)\n\t... 17 more\n"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/elk.log

Master node configuration in Yaml file:

- name: master-1
      count: 3
      config:
        node.store.allow_mmap: false
        node.roles: ["master"]
        xpack.security.authc.realms.saml.kibana-realm.order: 2
        xpack.security.authc.realms.saml.kibana-realm.attributes.principal: nameid
        xpack.security.authc.realms.saml.kibana-realm.attributes.groups: "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups"
        xpack.security.authc.realms.saml.kibana-realm.idp.metadata.path: "https://login.microsoftonline.com/########-####-####-####-##############/federationmetadata/2007-06/federationmetadata.xml?appid=########-####-#####-####-#############"
        xpack.security.authc.realms.saml.kibana-realm.idp.entity_id: "https://sts.windows.net/#########-####-####-####-############/"
        xpack.security.authc.realms.saml.kibana-realm.sp.entity_id: "https://kibana-uat.#####.##.##"
        xpack.security.authc.realms.saml.kibana-realm.sp.acs: "https://kibana-uat.#####.##.##/api/security/saml/callback"
        xpack.security.authc.realms.saml.kibana-realm.sp.logout: "https://kibana-uat.#####.##.##/logout"
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Mi
              limits:
                storage: 1Gi
            # storageClassName: local-path
          status: {}
      podTemplate:
        metadata:
          labels:
            elk-role: master
        spec:
          containers:
            - name: elasticsearch
          automountServiceAccountToken: true
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 50
                  preference:
                    matchExpressions:
                      - key: ELK
                        operator: In
                        values:
                          - "true"
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  podAffinityTerm:
                    labelSelector:
                      matchLabels:
                        elk-role: master
                    topologyKey: kubernetes.io/hostname

          tolerations:
            - key: ELK
              operator: Exists
              effect: NoSchedule

Kibana configuration:

spec:
  count: 1
  elasticsearchRef:
    name: elk
  secureSettings:
  - secretName: kibana-saml
  http:
    service:
      metadata:
        creationTimestamp: null
      spec: {}
    tls:
      selfSignedCertificate:
        disabled: true
  podTemplate:
    metadata:
      creationTimestamp: null
    spec:
      automountServiceAccountToken: true
      containers:
        - name: kibana
          resources:
            limits:
              cpu: 500m
              memory: 2Gi
            requests:
              cpu: 50m
              memory: 100Mi
  version: 8.9.1

Contents of kibana-saml:

pack.security.authc.providers.saml.kibana-realm.order= 0
xpack.security.authc.providers.saml.kibana-realm.realm= kibana-realm
xpack.security.authc.providers.saml.kibana-realm.description= "Log in with Azure AD"

Ok, except for feeling like an idiot, we found the solution minutes after posting on this forum.

We needed to enable https communication between nodes, and needed the following in the config:

xpack.security.authc.token.enabled: true

You should also refer to the ECK docs for SAML:

I submitted a few patches to that doc.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.