Bulk via Python to Kubernetes cluster

Itay_Bittan · March 31, 2023, 3:18pm

Hi!

We are heavily indexing to Elasticsearch 8.6.1 via Python code using bulk API.
Our cluster runs on Kubernetes with the elastic operator which creates the following services:

my-cluster-es-data
my-cluster-es-http
my-cluster-es-internal-http
my-cluster-es-master
my-cluster-es-transport

In Python I'm creating es client like that:

es_conn = AsyncElasticsearch(hosts="http://my-cluster-es-http.my-ns.svc.cluster.local:9200")

On one hand, I think that it's good because the service wraps all nodes and will always route to an available node. On the other hand, it feels like it is better to provide the client with all the nodes, and maybe it will do something better for a faster interaction (round robin?) and finally a faster indexing.

What's it correct? am I doing it right?
Thanks

Wave · April 4, 2023, 10:14pm

Hi @Itay_Bittan,
I haven't used the bulk api via python, but according to the documentation it looks like you should be able to define multiple hosts if you want. Hopefully trying out hosts will let you know if it gives you better performance.

Itay_Bittan · April 5, 2023, 5:59am

thanks @Wave!
I saw it and that's why I asked.
On the other hand, Kubernetes service resource should give you a kind of abstraction - so you won't have to update your application whenever a node joins/left the cluster (the hosts list).
I wonder if any of the approaches promise a kind of load balancing across all nodes.

Wave · April 5, 2023, 1:46pm

That makes sense. My hunch is using a host list will load balance. That's how it works everywhere else in the elastic stack when multiple hosts are provided. Might be a good experiment to find out for sure.

system · May 3, 2023, 1:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple elasticsearch python client sending bulk json data to single ES cluster Elasticsearch	1	609	October 9, 2018
Alternative bulk indexing implementations? Elasticsearch	10	2278	July 5, 2017
Bulk queue_size Elasticsearch	9	12621	July 5, 2017
Bulk indexing requests are mostly queued on one node in the cluster Elasticsearch	3	555	December 28, 2020
Python Bulk Helper Load With Multiple Indices? Elasticsearch	1	623	April 11, 2017

Bulk via Python to Kubernetes cluster

Related topics