We are heavily indexing to Elasticsearch 8.6.1 via Python code using bulk API.
Our cluster runs on Kubernetes with the elastic operator which creates the following services:
On one hand, I think that it's good because the service wraps all nodes and will always route to an available node. On the other hand, it feels like it is better to provide the client with all the nodes, and maybe it will do something better for a faster interaction (round robin?) and finally a faster indexing.
Hi @Itay_Bittan,
I haven't used the bulk api via python, but according to the documentation it looks like you should be able to define multiple hosts if you want. Hopefully trying out hosts will let you know if it gives you better performance.
thanks @Wave!
I saw it and that's why I asked.
On the other hand, Kubernetes service resource should give you a kind of abstraction - so you won't have to update your application whenever a node joins/left the cluster (the hosts list).
I wonder if any of the approaches promise a kind of load balancing across all nodes.
That makes sense. My hunch is using a host list will load balance. That's how it works everywhere else in the elastic stack when multiple hosts are provided. Might be a good experiment to find out for sure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.