a while ago we benchmarked the REST client against the transport client (see https://www.elastic.co/blog/benchmarking-rest-client-transport-client for details and https://github.com/elastic/elasticsearch/tree/master/client/benchmark for the source code). Note that we used the low-level REST client because back then there was no high-level REST client.
From your description I do not understand how you measure "search time" or "timings". When benchmarking Elasticsarch, I distinguish between three related metrics:
- service time
In order not to repeat myself, I'm basing this description on the Rally docs:
took is the time needed by Elasticsearch to process a request. As it is determined on the server, it can neither include the time it took the client to send the data to Elasticsearch nor the time it took Elasticsearch to send it to the client. This time is captured by service time, i.e. it is the time period from the start of a request (on the client) until it has received the response.
The explanation of latency is a bit more involved. Imagine you want to grab a coffee on your way to work. You make this decision independently of all the other people going to the coffee shop so it is possible that you need to wait before you can tell the barista which coffee you want. The time it takes the barista to make your coffee is the service time. The service time is independent of the number of customers in the coffee shop. However, you as a customer also care about the length of the waiting line which depends on the number of customers in the coffee shop. The time it takes between you entering the coffee shop and taking your first sip of coffee is latency.
I am not sure whether Load Runner is able to measure latency accurately or if it is actually only measuring service time. Anyway, based on the numbers you present I have the impression that the system is completely saturated:
You are issuing 350 requests per second (7 queries * 50 users = 350 queries per second). At this rate, I'd expect a non-saturated system to respond in the worst (not average!) case in (1 / 7) s = 142 ms. However, you are already reporting average response times of 4 to 5 seconds. You should check
took in the responses. If it is significantly lower than - say - 4 seconds, this suggests that your query spends the majority of time in the search queue (i.e. the waiting line) which indicates that you have overloaded the system. If
took is indeed in this range, the same rationale applies: Your target throughput was too large to begin with.
From my perspective you can do one of two things:
- Reduce your target throughput so that you do not bring the system into saturation. You also would not operate it in this mode in production.
- Increase the system's capacity to actually handle the load.
You should also check out Relating Service Utilisation to Latency for more background; it's a very eye-opening article if you are new to queuing theory.