Http port 9200 behind Citrix ADC loadbalancing

Trying desperately to avoid our Citrix ADC health monitoring of elasticsearch https port 9200 backend services from seeing LB monitoring traffic as plain text and thus avoid log error events like:

received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/10.141.33.64:9200, remoteAddress=/10.94.139.12:42849}

remote address .12 is our Citrix ADC health monitoring IP.

Have tried various health monitor type, TCP-ECV + Secure:YES, HTTP-ECV + Secure:YES, https-ecv in various versions, Testing eg. from ADC shell with curl -kv https://nodeIP:9200/_cluster/health?pretty returns fine cluster status json and does properly TLS1.3 handshake, but the monitors causes the plain text complains all the time spaming the cluster logs :confused:

Despite the elastic side log complains, the monitors all deems backend service as successfully UP, it’s only the log being spammed.

Any hints appreciated, TIA!

Hi @stefws

I am confused what is generating the monitoring health checks.

My experience is many integrations to Elasticsearch default to http not https .

As I am sure you know that is exactly what that message is.. whatever is doing the monitoring is sending http to and https endpoint

OR

your LB is doing SSL termination on the health check URL/endpoint ... and thus the LB is receiving https , terminating SSL then forwarding http to the health check endpoint.

This probably did not help.. but perhaps I am missing something.

Technically, it just means that what ES received on the channel was not a valid TLS record, we don’t actually check to make sure it’s actually a HTTP request:

In any case this still seems like invalid traffic coming from the monitoring system, despite which it’s still finding ES to be healthy. There’s no more diagnostics available within ES itself AFAIK, you will need to look at a packet capture to understand exactly what is being sent.

1 Like

ES is terminating the SSL thus our LB is only doing SSLBRIGDE for incoming requests forwarded to ES end points, but to know which ES backend points are healthy to forward/LB request towards our LB needs to do backend monitoring .Thus we configure a monitor to probe healthy state of ES, though every attempt to get Citrix ADC use proper https probing req eg. GET /_cluster/health looking for say return code 200 or cluster_name in response, ES keeps claiming that the probing health req.s are sending http data prior to finishing TLS handshake or not attempting TLS handshake at all thus seeing http on a https listening socket of ES. The health monitors are still claiming success, eg that is is seeing f.ex. cluster_name in responses, thus you would belive that ES is responsing as expects and the request ought to be using https/TLS handshakes, so why is ES still caliming to see data send before finishing TLS handshake, we wonder.

So it seems that ES is closing the socket upon receiving data prior to TLS handshake has finished, thus ADC must be doing more than our configured https-ecv req as these are successfully responsed ADC finds. Maybe ADC is also doing say simple tcp syn-ack hidden monitoring or would such resolve in as we saw on the Fleet services:

http: TLS handshake error from 10.94.139.12:64091: EOF 

This stopped on the fleet services when altered the monitor probe from tcp syn-ack to https-ecv using ‘GET /’ requests.

How are you determining this? ES is closing the socket because the client doesn’t seem to be doing TLS correctly, resulting in a NotSslRecordException being thrown. I wouldn’t have expected this to be triggered by sending properly-framed data just at the wrong time, instead I think it indicates the data is not properly-framed somehow. Could be wrong, I’m not digging through the TLS implementation to work out the details, but still everything here looks to be a misbehaving client.

I read your code snippet that the:

means that the socket aka channel is closed on the ES side after logging the warning.

If this is correct then I’m just puzzled that when asking ADC to send https ECV monitor requests, it are deeming ES services UP and healthy.

ADC config:

add ssl profile elastic_ssl_profile -sslProfileType BackEnd -eRSA DISABLED -sessReuse ENABLED -sessTimeout 300 -tls13 ENABLED -SNIEnable ENABLED

bind ssl profile elastic_ssl_profile -eccCurveName X25519_MLKEM768
bind ssl profile elastic_ssl_profile -eccCurveName X_25519
bind ssl profile elastic_ssl_profile -eccCurveName P_256
bind ssl profile elastic_ssl_profile -eccCurveName P_384
bind ssl profile elastic_ssl_profile -eccCurveName P_224
bind ssl profile elastic_ssl_profile -eccCurveName P_521

add lb monitor elastic_https HTTP-ECV -send "GET /_cluster/health HTTP/1.1\r\nHost: pseudohost.<readected domain>\r\nAuthorization: Basic <redacted auth>\r\nConnection: close\r\n\r\n" -recv cluster_name -LRTM DISABLED -resptimeout 3 -secure YES -sslProfile elastic_ssl_profile

elastic_https monitor probes are successfully seeing ‘cluster_name’ in respones and the only monitor bound to probe ES services.

Yes, but that does not mean it’s receiving data prior to completing the TLS handshake (the claim in your earlier message), it just means that it’s seeing non-TLS data on this channel at some point. It might be before the TLS handshake completes, but it might also be after completing a request.

Agree, it was a statement I most properly picked up from conversation with ChatGPT :slight_smile:

Appreciated your help. It just frustrating to debug such issues. may have to try and dig into low level utils like nstcpdump/nstrace on the ADC nodes, that’ll be a first time for me.

1 Like

Belive I’ve been mislead by the fact of seen the ADC’ NSIP as source IP for plain text made me focus on the ADC health monitoring. It might be an issue with Elastic Agents failing to pull local node stats throught our configured Fleet Policies for elastic node self monitoring has the Elasticsearch Integration>Metrics /Stack Monitoring)->Settings->Host: https://localhost:9200 + SSL Configuration: verification_mode: certificate so such polling ought to come from localhost and not ADC NSIPs as seen via tcpdump on complaing target f.ex:

$ sudo tcpdump -nn -A -s 0 'tcp port 9200' 2>/dev/null | awk -v adc1="10.94.139.11" -v adc2="10.94.139.12" '
/IP [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+ >/ {
    ip = $3; gsub(/\.[0-9]+$/, "", ip)  # strip port
}
/GET|POST|PUT|DELETE/ {print ip, $0}'
...
10.94.139.11 .!@..#..K..4AI.P...]$..GET /_nodes/_local/nodes HTTP/1.1
10.94.139.12 .!@..#....Q1.S.P.......GET / HTTP/1.1
10.94.139.11 .!@|o#..3U.m.&.P...x...GET /_nodes/_local/nodes HTTP/1.1
10.94.139.12 .!@.U#..+U.bMB.P...oW..GET /_nodes/_local/nodes HTTP/1.1
10.94.139.12 .!@7.#.3.5.K.x`P...h...GET /_nodes/_local/nodes HTTP/1.1
10.94.139.11 .!@..#..U..L...P...p...GET /_nodes/_local/nodes HTTP/1.1
10.94.139.11 .!@..#....~pM5.P....S..GET /_nodes/_local/nodes HTTP/1.1
10.94.139.12 .!@.D#..../.f..P...H...GET /_nodes/_local/nodes HTTP/1.1
10.94.139.11 .!@.9#.=Me.....P.......GET /_nodes/_local/nodes HTTP/1.1
10.94.139.12 .!@.j#..{...@aYP.......GET /_nodes/_local/stats/ingest HTTP/1.1
10.94.139.11 .!@N.#.1..\]...P.......GET /_nodes/_local/nodes HTTP/1.1
...

These isn’t ADC health monitorin probes, but looks very much like elastic agents elasticsearch integration polling node stat metrics.

ChatGPT says:

Ah — now I see exactly what you’re puzzling over. Yes, you are correct: the ADC is just bridging TLS, it does not convert anything to plain HTTP. So anything showing up as plain HTTP hitting ES cannot come from the ADC itself.

What is actually happening is likely this chain:
	1.	Fleet / Elastic Agent configured to poll ES metrics
	•	Host: https://localhost:9200
	•	SSL verification: certificate
	2.	Elastic Agent cannot verify the SSL certificate for localhost
	•	Because the ES node certificate is issued for its FQDN, not localhost.
	•	Verification fails.
	3.	Fallback behavior
	•	Fleet/Agent tries the next configured endpoint — often the Fleet Global Setting pointing to ADC NSIP (HTTPS).
	•	But if that somehow misconfigures the URL scheme or SSL handshake fails, the agent may retry over plain HTTP to avoid complete failure.
	4.	Result on ES logs
	•	You see ADC NSIP as the source.
	•	Requests are plain HTTP.
	•	These requests are not originating from the ADC, but from the agent itself using ADC as “host” in the URL.

So, the ADC is just an innocent bystander — it never decrypts or converts the request. It’s the agent’s failback behavior that triggers plain HTTP requests to ES via the ADC NSIP.

✅ The correct fix is indeed:
	•	Set verification_mode: none for local node polling so the agent talks HTTPS to localhost without failing cert verification.
	•	Make sure the agent never needs to “fall back” to the ADC for node stats.

Might agents make some sort of failback from Integration→Metrics→Host setting?

Mmpossibly tho it seems like a pretty serious bug to fall back to http://localhost if we decided https://localhost couldn’t be trusted…

Could you share the whole HTTP request? Particularly interested in the User-agent header, this should help confirm it’s Elastic Agent or something else.

Once confirmed, I’d suggest you open another discussion in Elastic Agent - Discuss the Elastic Stack .

Would like not to believe everything ChatGpt is speculating into.

Our ADC was just appearing as source IPs but that’s due to that fact that they are just do NAT/forwarding for the real caller, thus expanding tcpdump on the active ADC node reavealed the real callers source IPs :slight_smile:

So believe I found the culprits now through this.

We have resently migrated a production ES cluster to all new nodes, retried the previous nodes running self monitoring both internally and via metricbeat. Metricbeats were still running to monitor the nodes general system metrics among other application metrics. Only I forgot to disable the elasticsearch + elasticsearch-xpack metricbeat modules on these previous ES nodes, so it appears that they ended up polling /_nodes/ stats through metricbeats output FQDN via our ADC vIP but apprently not using the default output protocol https but http. Note that the previous production nodes were also complaining over plain text on they SSL port 9200, but back then it was attributed to ADC health probes not doing proper https when probing according to a certain Chat service on the net, so now we have upgraded the ADC SW from v.13.1 to latest v.14.1 claimed to be able to do proper https-ecv probes.

Disabled self monitoring in the production cluster, as we are now using a dedicated monitoring cluster and using all elastic agents and fleet policy to poll ES nodes. This didn’t stop the plain text complains.

But disabling the previous ES production nodes’ metricbeat elasticsearch modules, have stopped complains over http plain text from the new ES nodes behind our ADC LB :slight_smile:

Thanks everyone for playing ping pong on this!

1 Like

Cool stuff... glad you found it ... those darn leftover bits!

:sweat_smile:

I will change my wording from exactly to "commonly indicates http traffic to https endpoint" especially here on discuss where folks do this all the time ... like all the time :slight_smile: