SAML AD ADFS yaml settings, troubleshooting and role mapping notes and insight (Solved)

Elasticsearch 6.5.4
Linux RPM

First, appreciate the help of those who have posted useful troubleshooting methods/info in the forums and to Cyberyllium (cyberyllium.io) and Thundercat Tech (thundercattech.com) for the extra hand in getting us up and running. Hopefully this guide will help others with SAML as well.

Reference Docs
Elasticsearch settings:

Kibana settings:

Very useful reference with known SAML implementation error messages/issues and how to troubleshoot them to a certain extent, #5 was one of our problems and it was an ADFS settings issue:

Useful reference in the discussion forum:

SAML blog that gave us a basic blueprint to work off of:

Working SAML sections of our elasticsearch.yml and kibana.yml:

#--------------------------------- ES SAML --------------------------------------------
#
##Enable the token service for SAML 12/11/18 - Tim
xpack.security.authc.token.enabled: true
###These are the settings needed to configure and implement for SAML authentication. -Ryan
xpack.security.authc.realms.saml1:
  type: saml
  order: 1
  idp.metadata.path: "https://xxx.xxx.xxx.gov/FederationMetadata/2007-06/FederationMetadata.xml"
  idp.entity_id: "http://xxx.xxx.xxx.gov/adfs/services/trust"
  sp.entity_id: "https://kibana.gov"
  sp.acs: "https://kibana.gov:443/api/security/v1/saml"
  sp.logout: "https://kibana.gov/logout"
  attributes.principal: "nameid"
  attributes.groups: "http://schemas.microsoft.com/ws/2008/06/identity/claims/role"
  nameid_format: "urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified"

#--------------------------- Kibana SAML ----------------------------------------------
###Below is additional information that may be needed to incorporate SAML.
xpack.security.sessionTimeout: 1800000
xpack.security.authProviders: [saml, basic]
server.xsrf.whitelist: [/api/security/v1/saml]
###If our Kibana instance is behind a proxy, you may also need to add configuration to tell Kibana
###how to form its public URL.
#xpack.security.public:
# protocol: https
# hostname: kibana-qa.svc.ny.gov
# port:443
xpack.security.public.protocol: "https"
xpack.security.public.hostname: "kibana.gov"
xpack.security.public.port: "443"

I believe the original error, that I forgot to annotate, was due to the fact that we had the wrong attributes.principal defined and potentially nothing more on our ES yaml side. For this field its seems like nameid or nameid:persistent will be the correct info for most deployments. Unfortunately we tried more combinations of the nameid_format and attributes.principal fields so we hit a few different errors along the way.

The way we figured out that the nameid_format field was correct for our deployment was when error #5 in the Common SAML Issues link, "Authentication to realm my-saml-realm failed - Provided SAML response is not valid for realm saml/my-saml-realm (Caused by ElasticsearchSecurityException[SAML Response is not a 'success' response: Code=urn:oasis:names:tc:SAML:2.0:status:AuthnFailed Message=null Detail=null])" started showing in our ES logs. This error indicated that ADFS didn't like the SAML version we were using and ADFS was stating the same thing on its end so we went back to urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified.

After that error was fixed it seemed like the information we were using for the attributes.principal was the problem. From the info our colleagues provided and the settings in the thread listed above we changed the attributes principal to "nameid" which worked in that it provided us with new errors to troubleshoot. Specifically that ADFS was getting different errors and when we tried to sign in via Chrome we received a {"message":"action [indices:data/read/search] is unauthorized for user [username@.gov]: [security_exception] action [indices:data/read/search] is unauthorized for user [username@.gov]","statusCode":403,"error":"Forbidden"} error. This is a good time to use the SAML Message Decoder app in Chrome which gave us insight into things being passed in the session such as kibana_gov and therefore the ability to troubleshoot what was working and what wasn’t.
(Info continues here: SAML AD ADFS yaml settings troubleshooting and role mapping notes and insight part II (Solved))

1 Like

This is awesome @Ryan_Downey , thanks for taking the time to document your experience. I'll add a few clarifications in case these prove helpful for others too.

I would expect that error to be: [SAML Response is not a 'success' response: Code=urn:oasis:names:tc:SAML:2.0:status:InvalidNameIDPolicy Message=null Detail=null]

We have recently updated our documentation to explain why this can happen, see #5 in : Common SAML issues | Elasticsearch Guide [master] | Elastic

Just a comment here that this is not about the SAML version, this is still SAML2.0 and it is the only version of the standard that the Elastic Stack supports. This is about the format of the NameID where the URN didn't change in SAML 2.0 compared to SAML 1.1 and remained urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified . This means that ADFS by default releases NameIDs with urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified format instead of urn:oasis:names:tc:SAML:2.0:nameid-format:persistent format.

Browser plugins decoding SAML messages are very helpful for troubleshooting, I just wanted to point out that this information is also logged at TRACE level in the elasticsearch.log as discussed in the last bullet in Common SAML issues | Elastic Stack Overview [7.4] | Elastic.

1 Like

Ioannis,
Appreciate the kind words and extra insight/clarification of this. It took some time to write up but hopefully it'll help others get their environments up and running. Enjoy the rest of your day!
Ryan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.