Concurrent requests and queue

Hi,
I am working on performances of ElasticSearch and concurrent requests, this is my last test:

The test was made with 3 instances m4.large on Amazon AWS:

1 instance with one master node - index queue : 1000 - only for routing
2 instances with both one data node - index queue : 1000 - for indexing

We sent 1000 concurrent transactions with 1000 threads starting simultaneously

We checked the state of the queues and we found for example:

First data node : 325
Second data node : 13

Now if we send 2000 concurrent transactions, the first node rejects transactions (because its queue is full) that could have been well processed by the second node (indeed the second node queue has room left).

Is this a normal situation or do we have missed something?

Is it possible to balance in real time the queries between the two data nodes ?

Thank you for your feedback

Nodes do not share thread pool queues, the queues are not shareable at all, they are implemented in isolated JVMs.

What confuses me, you talk about "transactions" - this is a completely wrong picture. There are no transactions at all. Each JVM (each ES node) owns a copy of the global cluster state and does not need to create transactions across nodes.

In the case you addressed all three nodes from your test, you created a skew - the data-less node must always route while the data nodes must route 50% of requests (assuming you have uniform shard distribution, but I doubt it from the numbers you gave). Routing is an extensive operation - at least 5-10ms of latency - and with 1000 operations you see the latency effect dominating over Elasticsearch thread pool queues. Also, it is not clear if you used document/shard routing, these parameters can create hot spots.

So you should address the nodes in such a manner to avoid skew: either you submit requests to the master node only (not recommended) or to the data nodes only with uniform data distribution, or ( you did not test this case) you set up dedicated client nodes for routing.

If you start 1000 threads simultaneously, you should be aware that your test client will execute threads in batches according to the availability of threads on the machine (since there is no machine that can execute 1000 threads in hardware). But maybe you mean 1000 connections.

Thank you for your clear answer. Sorry if I was unclear on some points, I am a student and I am currently learning Elasticsearch.

I made some other researches to understand how we can handle many concurrent requests according to the best practices of ElasticSearch.

We will have thousands of terminals which will send documents for indexing to elastic using elasticsearch API.

Could you review the hypothetical infrastructure below ?

  • We need two or three client nodes to handle the http requests, we can use load balancing to distribute the request between our different client nodes.

  • We need three dedicated master/non-data nodes. (three seems to be the recommended number, to prevent master nodes down)

  • We will have 5/6 data nodes.
    I read that the number of shard should be 1.5 to 3 times the number of nodes. So if we use 6 data nodes, we have to set between 9 and 18 shards ?

Also, for the visualisation, we can have two client nodes dedicated for Kibana, and use another load-balancing for this two nodes.

Thank you very much for your help !