I am trying to understand the correct way of doing "Connection Pooling" using Python Client 8.10 API.
In the old 7.x docs, I found a little bit of information around this but that doesn't seem to exist in 8.x ( at least I couldn't find it ):
My current thinking is that if I simply pass a list of connection nodes to Elasticsearch function, then it manages the selection process and dead connections for me?
connection_nodes = [
"https://elasticsearch-dev-1:500",
"https://elasticsearch-dev-2:500",
"https://elasticsearch-dev-3:500"
]
# does this mean that if "elasticsearch-dev-1" and "elasticsearch-dev-2" node are down for whatever reasons then "elasticsearch-dev-3" is selected from the pool?
es = Elasticsearch(hosts=connection_nodes)
You can read more about the different ways to connect multiple nodes and see examples in the docs here
You can use dictionaries when referencing the hosts if you have different parameters, turn on sniffing, and set up different authentication methods. The connections format must be a list of dictionaries or host[:port] which will be translated to a dictionary automatically. As in these examples:
es = Elasticsearch(
['localhost:443', 'other_host:443'])
es = Elasticsearch([
{'host': 'localhost'},
{'host': 'othernode', 'port': 443, 'url_prefix': 'es', 'use_ssl': True},
])
The transport layer will create an instance of the selected connection class per node and keep track of the health of individual nodes - if a node becomes unresponsive (throwing exceptions while connecting to it) it’s put on a timeout by the ConnectionPool class and only returned to the circulation after the timeout is over (or when no live nodes are left). By default nodes are randomized before being passed into the pool and round-robin strategy is used for load balancing
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.