Concurrent requests and queue

Nodes do not share thread pool queues, the queues are not shareable at all, they are implemented in isolated JVMs.

What confuses me, you talk about "transactions" - this is a completely wrong picture. There are no transactions at all. Each JVM (each ES node) owns a copy of the global cluster state and does not need to create transactions across nodes.

In the case you addressed all three nodes from your test, you created a skew - the data-less node must always route while the data nodes must route 50% of requests (assuming you have uniform shard distribution, but I doubt it from the numbers you gave). Routing is an extensive operation - at least 5-10ms of latency - and with 1000 operations you see the latency effect dominating over Elasticsearch thread pool queues. Also, it is not clear if you used document/shard routing, these parameters can create hot spots.

So you should address the nodes in such a manner to avoid skew: either you submit requests to the master node only (not recommended) or to the data nodes only with uniform data distribution, or ( you did not test this case) you set up dedicated client nodes for routing.

If you start 1000 threads simultaneously, you should be aware that your test client will execute threads in batches according to the availability of threads on the machine (since there is no machine that can execute 1000 threads in hardware). But maybe you mean 1000 connections.