Auth to ECK using Azure AAD SAML failing

Hey guys,

I am facing a weird issue. Following this guide as a roadmap: SSO / Azure AD setup and mostly https://www.elastic.co/blog/saml-based-single-sign-on-with-elasticsearch-and-azure-active-directory I have a combination of mostly working SAML auth.

If I load up the Kibana page, it redirects me correctly to the Azure Auth portal. I enter credentials (twice? once in the login form then a pop-up appears to enter them again) and I am forwarded to Elastic and can do things. If I click logout URL then I encounter problems - I do not get redirect or my session does not get processed correctly because when I get back to Kibana, I only get this:

{"statusCode":401,"error":"Unauthorized","message":"Unauthorized"}

Now regardless of how I try again, I only get this response. Short of restarting pods (did I mention this is on ECK?) or remove all of my browsing histories or get the password in AAD reset, I cannot log in using SAML. If I try to go to the Elastic trough the user portal online, where I see my application, and I try to log in, I sometimes can log in, but after I close the tab with Elastic, and load it up again, I still get error 401.

I also caught:
{"statusCode":500,"error":"Internal Server Error","message":"[security_exception] Authenticating realm saml_aad does not exist"}
but I think that was perhaps my mistake with config, I did see that only 2 times.

Now I also catch sometimes this error:

My elastic config is:

      xpack.security.authc.api_key.enabled: true
      xpack.security.authc.token.enabled: true
      xpack.security.authc.realms.native.native1:
        order: 0
      xpack.security.authc.realms.saml.saml_aad:
        order: 1
        idp.metadata.path: "https://login.microsoftonline.com/1234/federationmetadata/2007-06/federationmetadata.xml?appid=1234"
        idp.entity_id: "https://sts.windows.net/1234/"
        sp.entity_id:  "https://kibana.juhu.com:5601"
        sp.acs: "https://kibana.juhu.com:5601/api/security/v1/saml"
        sp.logout: "https://kibana.juhu.com:5601/logout"
        attributes.principal: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name"
        attributes.groups: "http://schemas.microsoft.com/ws/2008/06/identity/claims/role"
        attributes.name: "http://schemas.microsoft.com/identity/claims/displayname"
        attributes.mail: "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress"

and my Kibana config is:

     server.xsrf.whitelist: [ /api/security/v1/saml ]
     xpack.security.public.protocol: "https"
     xpack.security.public.hostname: "kibana.juhu.com"
     xpack.security.public.port: "5601"
     xpack.security.authc.providers:
       basic.basic1:
         order: 0
         hint: "ES Local"
       saml.saml1:
         order: 1
         realm: saml_aad
         description: "ES AAD"

Any thoughts about what I am missing? I configured simple Azure Enterprise Application, following guide I provided above and since sometimes is working sometimes not, I am puzzled on how to proceed actually. TIA

Hi @damjank,

If you're using Elastic Stack 7.7.0+, I don't think you need all of the parts that you have in Kibana config. I think it just needs to be

xpack.security.authc.providers:
    basic.basic1:
        order: 0
        hint: "ES Local"
    saml.saml1:
        order: 1
        realm: saml_aad
        description: "ES AAD"
        icon: "logoAzure"

ES is at version 7.8. So you are referring to remove server.xsfr... and all x.pack.security until providers, right? Will try that and report back.

EDIT: after amending deployment, there is still same issue. We can login, after logout we get:
{"statusCode":500,"error":"Internal Server Error","message":"[security_exception] Authenticating realm saml_aad does not exist"}

we reload, login and get
{"statusCode":401,"error":"Unauthorized","message":"Unauthorized"}

I'm not sure what's going on here. Do the kibana logs show anything relevant?

I will get logs from pods themselves or login into one and get those. Also just to confirm - ES configuration on SAML is only for master nodes, right?

The Elasticsearch configuration for the SAML realm must be in the configuration of all nodes, as far as I am aware. If it isn't, that could explain why you are seeing intermittent failures.

Even the data nodes? OK will try this as well.

Yes, all nodes - data, master, coordinating, etc.

I can confirm, that after adding configuration for auth to ALL nodes, it is working as expected - intermittent failures were because node participating in auth did not have correct configuration. I did not know that ALL nodes, regardless of type, do that. Thanks for all help! Cheers!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.