Connection pooling in Python

ssharma7884 · July 10, 2019, 10:46am

Hi,
I am using python to connect to elastic search DB cluster.
No. of nodes = 3, each node has a user id and password

I am trying to implement connection pooling, referred the documentation for python api at:
https://elasticsearch-py.readthedocs.io/en/master/connection.html
...under the " Connection Pool" section...but could not get it to work.

Can someone please point me to a working example to implement connection pooling using python

Thanks & Regards,
Sachin

ssharma7884 · July 18, 2019, 8:46am

I got this to work...sample code below....

from elasticsearch import Transport

try:
qry = {"query": {"bool": {"must": [{"match": {"extension": "css"}}, {"match": {"machine.os": "ios"}}]}}}

transport = Transport([{'host': '<your_host1>'}, {'host': '<your_host2>'}], http_auth=('<your_user>', '<user_password>'))
print('no of connections in connection pool before adding connection = ' + str(len(transport.connection_pool.connections)))
#you may increase the range value from 2 to required no. as per your requirement
for cntr in range(2):
	transport.add_connection({'host': '<your_host1>'})
	transport.add_connection({'host': '<your_host2>'})
print('no of connections in connection pool after adding connection = ' + str(len(transport.connection_pool.connections)))
cnn1 = transport.get_connection()
cnn2 = transport.get_connection()
cnn3 = transport.get_connection()
cnn4 = transport.get_connection()
cnn5 = transport.get_connection()
cnn6 = transport.get_connection()
cnn7 = transport.get_connection()
cnn8 = transport.get_connection()
cnn9 = transport.get_connection()
cnn10 = transport.get_connection()

result = transport.perform_request(method='GET', url='/<index_name_to_query>/_search', body=qry)
for doc in result['hits']['hits']:
	print('got data...')

except Exception as e:
print('exception...')
print(str(e))

bry-c · July 19, 2019, 4:28am

Hi @ssharma7884,

You do not need to call directly the Transport class.

You can just initialize the Elaticsearch class and you can access all functions.

Check the link to see examples:
https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch

ssharma7884 · July 19, 2019, 6:50am

Hi bry-c,
Thanks for the link.
I was initially using the Elasticsearch class as below:

from elasticsearch import Elasticsearch
....
elastic_conn = Elasticsearch(['<my_host>'], http_auth=('<my_user>', '<my_user_pwd>'))
qry = {"query": {"bool": {"must": [{"match": {"extension": "css"}}, {"match": {"machine.os": "ios"}}]}}}
res = elastic_conn.search(index="<my_index_name>", body=qry)
for doc in res['hits']['hits']:
print('got data...')

But, I need to implement connection pooling at server start up & use a connection from this pool (by using --> transport.get_connection()) for all future queries. I could not find a link to implement connection pooling using the above, hence did connection pooling using the "Transport" class.

How can I implement connection pooling (at server start up) as per your link?

Thanks & Regards,
Sachin

bry-c · July 19, 2019, 8:37am

Hi @ssharma7884,

If you use django you can create one module inside your project folder.

# es_conn.py
import elasticsearch
from django.conf import settings

es = elasticsearch.Elasticsearch(settings.ELASTIC_SERVER, **settings.ELASTIC_CONFIG)

The es instance will only initialize once on start up.
Elasticsearch instance already create a pool of connection from the provided array of hosts.

Then you can use it like this in other module.

# other_module.py
from es_conn import es

es.exists(index='index_name', doc_type='doc', id='123')

ssharma7884 · July 21, 2019, 8:31am

bry-c,
Thanks for your comments.
I am using flask & apache2. My existing connection pooling implementation is similar to what you have mentioned.
After your reply, I re-read the following links/sections which gave me the required details & clarifications I had missed out initially

https://elasticsearch-py.readthedocs.io/en/master/#persistent-connections
elasticsearch-py uses persistent connections inside of individual connection pools (one per each configured or sniffed node)

https://elasticsearch-py.readthedocs.io/en/master/#thread-safety
By default we allow urllib3 to open up to 10 connections to each node, if your application calls for more parallelism, use the maxsize parameter to raise the limit:

#maxsize parameter for connection poolsize
es = Elasticsearch(["host1", "host2"], maxsize=25)

Query

Here, I am assuming that the parameter "maxsize" is for no. of persistent connections per confirgured node.
As per the thread-safety link above :
If your application is long-running consider turning on Sniffing to make sure the client is up to date on the cluster location.

My application is long running almost 24x7, so should I turn on sniffing mechanism?

Thanks & Regards,
Sachin Vyas.

ssharma7884 · July 21, 2019, 8:46am

I also came across the following 2 links, which provided further clarification:

https://elasticsearch-py.readthedocs.io/en/master/connection.html#elasticsearch.Urllib3HttpConnection

Refer the "maxsize" parameter comments under the above link & also the link:
https://urllib3.readthedocs.io/en/1.4/pools.html#api

Thanks & Regards,
Sachin Vyas.

bry-c · July 21, 2019, 12:45pm

Hi @ssharma7884,

Yes it’s better to enable sniffing as mentioned in the documentation.

system · August 18, 2019, 12:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Connection pooling using Python Client 8.10 Elasticsearch	3	401	November 8, 2023
Elasticsearch multiple connections using python Elasticsearch language-clients	6	1035	December 28, 2021
How to connect from python to an elasticsearch cluster? Elasticsearch	1	335	April 23, 2020
Connecting to Elasticsearch cluster in cloud from Django using connections.create_connection API Elasticsearch	1	983	November 17, 2017
Will Elasticsearch automatically load balance a two node cluser Elasticsearch	1	411	November 20, 2019

Connection pooling in Python

Related topics