Http port 9200 behind Citrix ADC loadbalancing

Trying desperately to avoid our Citrix ADC health monitoring of elasticsearch https port 9200 backend services from seeing LB monitoring traffic as plain text and thus avoid log error events like:

received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/10.141.33.64:9200, remoteAddress=/10.94.139.12:42849}

remote address .12 is our Citrix ADC health monitoring IP.

Have tried various health monitor type, TCP-ECV + Secure:YES, HTTP-ECV + Secure:YES, https-ecv in various versions, Testing eg. from ADC shell with curl -kv https://nodeIP:9200/_cluster/health?pretty returns fine cluster status json and does properly TLS1.3 handshake, but the monitors causes the plain text complains all the time spaming the cluster logs :confused:

Despite the elastic side log complains, the monitors all deems backend service as successfully UP, it’s only the log being spammed.

Any hints appreciated, TIA!

Hi @stefws

I am confused what is generating the monitoring health checks.

My experience is many integrations to Elasticsearch default to http not https .

As I am sure you know that is exactly what that message is.. whatever is doing the monitoring is sending http to and https endpoint

OR

your LB is doing SSL termination on the health check URL/endpoint ... and thus the LB is receiving https , terminating SSL then forwarding http to the health check endpoint.

This probably did not help.. but perhaps I am missing something.

Technically, it just means that what ES received on the channel was not a valid TLS record, we don’t actually check to make sure it’s actually a HTTP request:

In any case this still seems like invalid traffic coming from the monitoring system, despite which it’s still finding ES to be healthy. There’s no more diagnostics available within ES itself AFAIK, you will need to look at a packet capture to understand exactly what is being sent.

ES is terminating the SSL thus our LB is only doing SSLBRIGDE for incoming requests forwarded to ES end points, but to know which ES backend points are healthy to forward/LB request towards our LB needs to do backend monitoring .Thus we configure a monitor to probe healthy state of ES, though every attempt to get Citrix ADC use proper https probing req eg. GET /_cluster/health looking for say return code 200 or cluster_name in response, ES keeps claiming that the probing health req.s are sending http data prior to finishing TLS handshake or not attempting TLS handshake at all thus seeing http on a https listening socket of ES. The health monitors are still claiming success, eg that is is seeing f.ex. cluster_name in responses, thus you would belive that ES is responsing as expects and the request ought to be using https/TLS handshakes, so why is ES still caliming to see data send before finishing TLS handshake, we wonder.

So it seems that ES is closing the socket upon receiving data prior to TLS handshake has finished, thus ADC must be doing more than our configured https-ecv req as these are successfully responsed ADC finds. Maybe ADC is also doing say simple tcp syn-ack hidden monitoring or would such resolve in as we saw on the Fleet services:

http: TLS handshake error from 10.94.139.12:64091: EOF 

This stopped on the fleet services when altered the monitor probe from tcp syn-ack to https-ecv using ‘GET /’ requests.

How are you determining this? ES is closing the socket because the client doesn’t seem to be doing TLS correctly, resulting in a NotSslRecordException being thrown. I wouldn’t have expected this to be triggered by sending properly-framed data just at the wrong time, instead I think it indicates the data is not properly-framed somehow. Could be wrong, I’m not digging through the TLS implementation to work out the details, but still everything here looks to be a misbehaving client.

I read your code snippet that the:

means that the socket aka channel is closed on the ES side after logging the warning.

If this is correct then I’m just puzzled that when asking ADC to send https ECV monitor requests, it are deeming ES services UP and healthy.

ADC config:

add ssl profile elastic_ssl_profile -sslProfileType BackEnd -eRSA DISABLED -sessReuse ENABLED -sessTimeout 300 -tls13 ENABLED -SNIEnable ENABLED

bind ssl profile elastic_ssl_profile -eccCurveName X25519_MLKEM768
bind ssl profile elastic_ssl_profile -eccCurveName X_25519
bind ssl profile elastic_ssl_profile -eccCurveName P_256
bind ssl profile elastic_ssl_profile -eccCurveName P_384
bind ssl profile elastic_ssl_profile -eccCurveName P_224
bind ssl profile elastic_ssl_profile -eccCurveName P_521

add lb monitor elastic_https HTTP-ECV -send "GET /_cluster/health HTTP/1.1\r\nHost: pseudohost.<readected domain>\r\nAuthorization: Basic <redacted auth>\r\nConnection: close\r\n\r\n" -recv cluster_name -LRTM DISABLED -resptimeout 3 -secure YES -sslProfile elastic_ssl_profile

elastic_https monitor probes are successfully seeing ‘cluster_name’ in respones and the only monitor bound to probe ES services.

Yes, but that does not mean it’s receiving data prior to completing the TLS handshake (the claim in your earlier message), it just means that it’s seeing non-TLS data on this channel at some point. It might be before the TLS handshake completes, but it might also be after completing a request.

Agree, it was a statement I most properly picked up from conversation with ChatGPT :slight_smile:

Appreciated your help. It just frustrating to debug such issues. may have to try and dig into low level utils like nstcpdump/nstrace on the ADC nodes, that’ll be a first time for me.

1 Like