io:Get "https://a.corp.internal": x509: certificate is valid for a.corp.internal, not b.corp.internal
This happens for each monitor where there are multiple hosts, with differing FQDNs.
Monitors with a single host, or monitors with multiple hosts on the same FQDN, don't have this issue.
My setup was working fine for months until I updated from 7.16.3 to 7.17.0 today.
Update: couldn't find the cause of the issue. Reverting back to 7.16.3 worked.
Sorry to hear you're hitting this issue. It's a tricky one to debug because I'm having trouble replicating it. I tried to do so with the following config:
Can you replicate this behavior against any public sites so that we could reproduce it? The strange thing here is that 7.17.0 doesn't contain any changes that should impact this AFAIK.
Reading through the error it sounds like somehow heartbeat is mixing up the cert for one endpoint with that of another, however, I would think my attempt at replication would reveal that same issue.
All it takes to trigger the behavior is to have a minimum of 2 FQDN (in the same monitor) which share a domain+TLD but have their own (non-wildcard) certificates.
The reports will alternate between:
io:Get "https://jira.corp.internal/status": x509:
certificate is valid for jira.corp.internal, not bitbucket.corp.internal
and
io:Get "https://bitbucket.corp.internal/status": x509:
certificate is valid for bitbucket.corp.internal, not jira.corp.internal
So you are right, heartbeat (or elastic) is mixing up the certs between endpoints. And its not constant either. It's alternating in some unknown fashion.
Its very hard for me to provide an example with open-internet URL's, since my coporate network has an TLS interceptor, which obfuscates a lot.
But perhaps this would work for you (provided both have their own certificates, not shared wildcard cert).
Like I said. They have to share the same domain+TLD for the problem to occur.
Using heartbeat 7.16.3 instead of 7.17.0 immediately fixes the issue (with the Elasticsearch version being constant at 7.17.0).
@PayBas I managed to reproduce the issue as you described, it looks to be a very specific edge case that involves domains with a common suffix and using non-wildcard certificates.
Workaround 2 (splitting monitors) is not practical for me. This would make my monitor files hundreds of lines long :). But I'll definitely look into ssl.verificaton_mode: certificate.
Otherwise I'll stick with 7.16.3 for now. The fact that I appear to be the only one running into this 7.17.0 bug suggests that it is indeed an edge case (one which hopefully will be addressed in a future release).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.