I'm in a bare-metal Kubernetes environment, and I see that documents sent from a fluent instance are being routed from my ingestor client to a non-existent data node:
2020-07-13 00:08:18 +0000 [warn]: #0 [elasticsearch] failed to flush the buffer. retry_time=12 next_retry_seconds=2020-07-13 00:42:42 +0000 chunk="5aa1cb4f03d198373ac2415b4d783073" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host\", :port=>443, :scheme=>\"https\", :user=>\"user\", :password=>\"obfuscated\"}): No route to host - connect(2) for 10.244.5.18:9200 (Errno::EHOSTUNREACH)"
Specifically: 10.244.5.18:9200
References a node that doesn't exist according to the _nodes
endpoint.
I'm also seeing the same error for nodes that do exist...
No route to host - connect(2) for 10.244.21.178:9200 (Errno::EHOSTUNREACH)"
"cT6aQm2sRGm82NWv9aEyHw": {
"name": "master-0",
"transport_address": "10.244.21.178:9300",
"host": "10.244.21.178",
"ip": "10.244.21.178",
Note that from within the client itself, I can reach that address:
[root@elasticsearch-es-client-f65788c6b-qqmhp elasticsearch]# curl 10.244.21.178:9200 -u xxxxx:yyyyyy
{
"name" : "master-0",
...
}
I figure this may be a Kubernetes specific issue with routing, but also the fact that the old IP still being references for some documents is concerning to me.
Where exactly is the information for the nodes fetched from by the clients? I want to see if that is outdated and update it somehow, but I don't know where to look.
Any immediate ideas on why this may happen?
I'm checking the CNI side of things as we speak.