Understanding RestClient.java and its use of DEFAULT_MAX_CONN_TOTAL

bdbusches · December 14, 2020, 2:03pm

We're having difficulty with RestClient against a large 700 node ES cluster. Under a moderate search load (47 concurrent searches) we see

extensive use of I/O Dispatchers (based on the default I/O Recator configs in the httpclient lib you use
apparent queuing of requests

We don't understand the DEFAULT_MAX_CONN_TOTAL vs. DEFAULT_MAX_CONN_PER_ROUTE settings you default to in RestClientBuilder.java. I've been scouring httpclient/core source code to understand.

Main question: when we create a new RestClient against a SLOW ES cluster (can take hours to retrieve 250k docs) - as we call performRequest() over scroll ids - are we creating new connections?

We can't recreate the problem at our office under smaller clusters - unless we can "slow down" ES per query - is there some secret thing we can do to make ES take minutes per query/page/scroll vs. ms/seconds just to keep these connections open longer to test the JVM/threading issues?

Thanks

bdbusches · December 14, 2020, 4:14pm

I understand it a bit more - i enabled org.apache.http debug - I can see that we clearly didn't understand the httpconfig and the defaults used in your RestClient (max of 10 per route, 30 total). Plus, the interaction with the default IO Reactor in all this is horrible - we're at http 4x, so the default is to create NCORES * 2 "I/O dispatcher" threads. For us, thats 160 per connection. With just 47 concurrent searches we have nearly 7500 threads created to do IO! Plus, with the max route/total limits, we're not even sure what's happening - how are they queuing, etc. Appreciate any light.

DavidTurner · December 14, 2020, 4:58pm

No, the REST client should be re-using connections.

That sounds very wrong. Are you creating a client per search? You should only normally have a single instance of the client in existence, and it should normally live until your application shuts down.

bdbusches · December 16, 2020, 11:31pm

it is wrong - our app is a GIS app which supports dynamic searches (user driven) of thousands of different layers - all from about 10 types of connections - ES, PostgreSQL, MemSQL, REST/WFS, MongoDB, etc.). So, our task manager handles queing and working off searches from a java thread pool. by default, it creates a NEW restClient for each search. Similar to how we create a new PostgreSQL/PostGIS connection per search (which is fine/expected for that kind of data source).

Thanks, we're building into the code now a capability to keep a static restClient, and set all those configs.

DavidTurner · December 17, 2020, 6:07am

I'm not sure that's good either, at least it wasn't when I last used PostgreSQL. Each connection to PostgreSQL spawns its own backend process which can itself take many milliseconds and definitely runs into problems with high numbers of concurrent clients. There are workarounds like pgBouncer, but IMO that's a bit of a hack compared with using an in-app connection pool.

system · January 14, 2021, 6:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Java Elasticsearch REST Client reuse or pooling Elasticsearch	4	10985	May 9, 2017
Best Practices on client (java) settings Elasticsearch	1	1046	July 6, 2017
Rest Client Asynchronous Connection Settings Elasticsearch	2	1519	April 27, 2017
Number of connections keep increasing for every search Elasticsearch	3	766	October 5, 2017
Question on REST Client and Java API Elasticsearch	5	1638	September 13, 2017

Understanding RestClient.java and its use of DEFAULT_MAX_CONN_TOTAL

Related topics