Version 6.5.4
9 node ES cluster
3 node Kibana cluster
I recently had a problem implementing TLS within our cluster. After implementing TLS in our QA environment and taking a lot of notes along the way, I thought that deploying it to our prod environment would have been fairly easy. Initially I went through the steps that I annotated for our QA envrionment and created our CA/certs off that documentation, obviously changing node names etc for the prod environment.
After deploying the certs and configuring the yaml files in our prod envrionment I figured that things would connect fairly quickly. Unfortuantely it didn't work out so easily. I initially started with just one ES node abc.gov (xpack/TLS enabled) and the kibana node 234.gov (xpack/TLS enabled) that its paired with through the kibana yaml file. This didnt work as I kept getting a Failed to authenticate user Kibana error message, which obviously mean that the kibana password isnt working. I turned on three of the ES nodes in the cluster (xpack/TLS enabled) to see if that would help. No luck. I turned on all of the ES nodes, no luck.
I did some troubleshooting, doublechecking certs, yaml syntax , etc with no luck. I then stumbled across an error in our Kibana Alerts section that said "Low - Not resolved - Configuring TLS will be required to apply a Gold or Platnium license when security is enabled." As we had just updated our license I thought that maybe there was a license issue, there wasn't in the end, but this sent me down the perverbial rabbit hole thinking that the license was messed up for some reason.
After troubleshooting the license issue and determining that this probably wasnt the problem I went looking for other possible issues. I ran across two other potential problems, the first being that we dont neccessarily have dedicated master nodes although our three kibana nodes are linked to abc5app.gov, def5app.gov and ghi5app.gov. Our system is setup with discovery.zen.minimum_master_nodes: 3 but everything elses is set on the default master.node settings, so nothing specifically dedicated if I'm understanding this correctly.
The second issue was that our discovery.zen.ping.unicast.hosts: were set up as ABC5APP, DEF5APP, GHI5APP, etc yet when we created our certs we just put in abc5app, def5app, ghi5app, etc. So syntactically these are not the same.
After recreating the certs with all or our possible syntax variables and deploying them I went back to trying to connect abc5app.gov with kibana 234app.gov, no luck. I then turned on the three ES nodes that are linked to the kibana nodes and turned on all three kibana nodes, no luck. After turning all of the nodes in our cluster back on with the new certs that had all the syntax combos implemented everything linked up. So the questions here after this multiparagraph report are this:
- Is my assumption correct that in order to implement TLS your cluster needs access to a master node? Not sure if I'm phraseing that correctly but it seems like this could have been a potential issue for our TLS deployment. The lack of access to a master node could have been creating the Failed to authenticate error potentially?
- How important is it that we have our master nodes dedicated so that ES doen't just pick one? As our envrionments are still new but will grow later it seems like establishing master nodes now will help us as we implement hot warm architecture in the future.
- As far as syntax goes for cert creation how specific does the --dns field have to be compared to the hosts listed in discovery.zen.ping.unicast.host? ie abc5app vs ABC5APP
- Is there a way to test your certs via curl for server to server so that you can know if they are the issue or not? I've tried things after researching online to no avail.