Bulk via Python to Kubernetes cluster

Hi!

We are heavily indexing to Elasticsearch 8.6.1 via Python code using bulk API.
Our cluster runs on Kubernetes with the elastic operator which creates the following services:

my-cluster-es-data
my-cluster-es-http
my-cluster-es-internal-http
my-cluster-es-master
my-cluster-es-transport

In Python I'm creating es client like that:

es_conn = AsyncElasticsearch(hosts="http://my-cluster-es-http.my-ns.svc.cluster.local:9200")

On one hand, I think that it's good because the service wraps all nodes and will always route to an available node. On the other hand, it feels like it is better to provide the client with all the nodes, and maybe it will do something better for a faster interaction (round robin?) and finally a faster indexing.

What's it correct? am I doing it right?
Thanks

Hi @Itay_Bittan,
I haven't used the bulk api via python, but according to the documentation it looks like you should be able to define multiple hosts if you want. Hopefully trying out hosts will let you know if it gives you better performance.

thanks @Wave!
I saw it and that's why I asked.
On the other hand, Kubernetes service resource should give you a kind of abstraction - so you won't have to update your application whenever a node joins/left the cluster (the hosts list).
I wonder if any of the approaches promise a kind of load balancing across all nodes.

That makes sense. My hunch is using a host list will load balance. That's how it works everywhere else in the elastic stack when multiple hosts are provided. Might be a good experiment to find out for sure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.