Trying desperately to avoid our Citrix ADC health monitoring of elasticsearch https port 9200 backend services from seeing LB monitoring traffic as plain text and thus avoid log error events like:
received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/10.141.33.64:9200, remoteAddress=/10.94.139.12:42849}
remote address .12 is our Citrix ADC health monitoring IP.
Have tried various health monitor type, TCP-ECV + Secure:YES, HTTP-ECV + Secure:YES, https-ecv in various versions, Testing eg. from ADC shell with curl -kv https://nodeIP:9200/_cluster/health?pretty returns fine cluster status json and does properly TLS1.3 handshake, but the monitors causes the plain text complains all the time spaming the cluster logs
Despite the elastic side log complains, the monitors all deems backend service as successfully UP, it’s only the log being spammed.
I am confused what is generating the monitoring health checks.
My experience is many integrations to Elasticsearch default to http not https .
As I am sure you know that is exactly what that message is.. whatever is doing the monitoring is sending http to and https endpoint
OR
your LB is doing SSL termination on the health check URL/endpoint ... and thus the LB is receiving https , terminating SSL then forwarding http to the health check endpoint.
This probably did not help.. but perhaps I am missing something.
Technically, it just means that what ES received on the channel was not a valid TLS record, we don’t actually check to make sure it’s actually a HTTP request:
In any case this still seems like invalid traffic coming from the monitoring system, despite which it’s still finding ES to be healthy. There’s no more diagnostics available within ES itself AFAIK, you will need to look at a packet capture to understand exactly what is being sent.
ES is terminating the SSL thus our LB is only doing SSLBRIGDE for incoming requests forwarded to ES end points, but to know which ES backend points are healthy to forward/LB request towards our LB needs to do backend monitoring .Thus we configure a monitor to probe healthy state of ES, though every attempt to get Citrix ADC use proper https probing req eg. GET /_cluster/health looking for say return code 200 or cluster_name in response, ES keeps claiming that the probing health req.s are sending http data prior to finishing TLS handshake or not attempting TLS handshake at all thus seeing http on a https listening socket of ES. The health monitors are still claiming success, eg that is is seeing f.ex. cluster_name in responses, thus you would belive that ES is responsing as expects and the request ought to be using https/TLS handshakes, so why is ES still caliming to see data send before finishing TLS handshake, we wonder.
So it seems that ES is closing the socket upon receiving data prior to TLS handshake has finished, thus ADC must be doing more than our configured https-ecv req as these are successfully responsed ADC finds. Maybe ADC is also doing say simple tcp syn-ack hidden monitoring or would such resolve in as we saw on the Fleet services:
http: TLS handshake error from 10.94.139.12:64091: EOF
This stopped on the fleet services when altered the monitor probe from tcp syn-ack to https-ecv using ‘GET /’ requests.
How are you determining this? ES is closing the socket because the client doesn’t seem to be doing TLS correctly, resulting in a NotSslRecordException being thrown. I wouldn’t have expected this to be triggered by sending properly-framed data just at the wrong time, instead I think it indicates the data is not properly-framed somehow. Could be wrong, I’m not digging through the TLS implementation to work out the details, but still everything here looks to be a misbehaving client.
Yes, but that does not mean it’s receiving data prior to completing the TLS handshake (the claim in your earlier message), it just means that it’s seeing non-TLS data on this channel at some point. It might be before the TLS handshake completes, but it might also be after completing a request.
Agree, it was a statement I most properly picked up from conversation with ChatGPT
Appreciated your help. It just frustrating to debug such issues. may have to try and dig into low level utils like nstcpdump/nstrace on the ADC nodes, that’ll be a first time for me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.