Connection pooling in Python

Hi,
I am using python to connect to elastic search DB cluster.
No. of nodes = 3, each node has a user id and password

I am trying to implement connection pooling, referred the documentation for python api at:
https://elasticsearch-py.readthedocs.io/en/master/connection.html
...under the " Connection Pool" section...but could not get it to work.

Can someone please point me to a working example to implement connection pooling using python

Thanks & Regards,
Sachin

I got this to work...sample code below....

from elasticsearch import Transport

try:
qry = {"query": {"bool": {"must": [{"match": {"extension": "css"}}, {"match": {"machine.os": "ios"}}]}}}

transport = Transport([{'host': '<your_host1>'}, {'host': '<your_host2>'}], http_auth=('<your_user>', '<user_password>'))
print('no of connections in connection pool before adding connection = ' + str(len(transport.connection_pool.connections)))
#you may increase the range value from 2 to required no. as per your requirement
for cntr in range(2):
	transport.add_connection({'host': '<your_host1>'})
	transport.add_connection({'host': '<your_host2>'})
print('no of connections in connection pool after adding connection = ' + str(len(transport.connection_pool.connections)))
cnn1 = transport.get_connection()
cnn2 = transport.get_connection()
cnn3 = transport.get_connection()
cnn4 = transport.get_connection()
cnn5 = transport.get_connection()
cnn6 = transport.get_connection()
cnn7 = transport.get_connection()
cnn8 = transport.get_connection()
cnn9 = transport.get_connection()
cnn10 = transport.get_connection()

result = transport.perform_request(method='GET', url='/<index_name_to_query>/_search', body=qry)
for doc in result['hits']['hits']:
	print('got data...')

except Exception as e:
print('exception...')
print(str(e))

Hi @ssharma7884,

You do not need to call directly the Transport class.

You can just initialize the Elaticsearch class and you can access all functions.

Check the link to see examples:
https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch

1 Like

Hi bry-c,
Thanks for the link.
I was initially using the Elasticsearch class as below:

from elasticsearch import Elasticsearch
....
elastic_conn = Elasticsearch(['<my_host>'], http_auth=('<my_user>', '<my_user_pwd>'))
qry = {"query": {"bool": {"must": [{"match": {"extension": "css"}}, {"match": {"machine.os": "ios"}}]}}}
res = elastic_conn.search(index="<my_index_name>", body=qry)
for doc in res['hits']['hits']:
print('got data...')

But, I need to implement connection pooling at server start up & use a connection from this pool (by using --> transport.get_connection()) for all future queries. I could not find a link to implement connection pooling using the above, hence did connection pooling using the "Transport" class.

How can I implement connection pooling (at server start up) as per your link?

Thanks & Regards,
Sachin

Hi @ssharma7884,

If you use django you can create one module inside your project folder.

# es_conn.py
import elasticsearch
from django.conf import settings

es = elasticsearch.Elasticsearch(settings.ELASTIC_SERVER, **settings.ELASTIC_CONFIG)

The es instance will only initialize once on start up.
Elasticsearch instance already create a pool of connection from the provided array of hosts.

Then you can use it like this in other module.

# other_module.py
from es_conn import es

es.exists(index='index_name', doc_type='doc', id='123')

bry-c,
Thanks for your comments.
I am using flask & apache2. My existing connection pooling implementation is similar to what you have mentioned.
After your reply, I re-read the following links/sections which gave me the required details & clarifications I had missed out initially

https://elasticsearch-py.readthedocs.io/en/master/#persistent-connections
elasticsearch-py uses persistent connections inside of individual connection pools (one per each configured or sniffed node)

https://elasticsearch-py.readthedocs.io/en/master/#thread-safety
By default we allow urllib3 to open up to 10 connections to each node, if your application calls for more parallelism, use the maxsize parameter to raise the limit:

#maxsize parameter for connection poolsize
es = Elasticsearch(["host1", "host2"], maxsize=25)

Query

  1. Here, I am assuming that the parameter "maxsize" is for no. of persistent connections per confirgured node.
  2. As per the thread-safety link above :
    If your application is long-running consider turning on Sniffing to make sure the client is up to date on the cluster location.

My application is long running almost 24x7, so should I turn on sniffing mechanism?

Thanks & Regards,
Sachin Vyas.

I also came across the following 2 links, which provided further clarification:

https://elasticsearch-py.readthedocs.io/en/master/connection.html#elasticsearch.Urllib3HttpConnection

Refer the "maxsize" parameter comments under the above link & also the link:
https://urllib3.readthedocs.io/en/1.4/pools.html#api

Thanks & Regards,
Sachin Vyas.

Hi @ssharma7884,

Yes it’s better to enable sniffing as mentioned in the documentation.