Curator not working with istio - fail to reach ES sporadically

I am trying to run elasticsearch with istio in kubernetes environment.
Elasticsearch-oss: 7.0.1
Istio 1.5.2
Istio sidecars are ingested into elasticsearch pods (master, data nodes) as well as into elasticsearch-curator pod.

The elasticsearch cluster formation is happening fine, the elasticsearch REST api keeps responding.
Curator is configured to run as a k8s cronjob every 5 min to delete old indices. We observe at times, curator is able to reach ES and delete indices as expected - resulting in jobs to succeed.
But a lot of times, it fails to reach ES with this error msg -

2020-05-27 01:01:36,036 ERROR     HTTP N/A error: HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c41220dd0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2020-05-27 01:01:36,037 CRITICAL  Curator cannot proceed. Exiting.

I triggered a script that continuously contacts Elasticsearch REST api on 9200 (_cat/nodes) every 5 seconds. The service was responsive & reachable all the time.

The same chart & configurations when installed without istio enabled in the namespace, work fine. Can you help us find the root cause of the issue? Are any additional configurations required?

Curator with debug logs attached (failed case) -

Curator logs with istio (success case) -

I cannot begin to explain why you would see this error message, but it is not a Curator problem.

ConnectionRefusedError: [Errno 111] Connection refused

Curator isn't the source of this message, but urllib3, the underlying HTTP connectivity module. Curator is unable to connect not because the target remote is unavailable, but that it is refusing to permit Curator (well, urllib3) to connect. So, it seems to me that whatever is managing connections in Kubernetes is a good place to start looking. Refused could mean access control is denying access, for example. I also cannot explain why it works occasionally, and not always. It's something inside k8s, is my guess.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.