I am comparing the performance of Elasticsearch REST Client and the Java APIs using Load Runner (using a load of 50 users at a rate of 7 queries/s)
Just a quick background on the data indexed. They consist of news/wiki articles with typical metadata like date, author, tags etc. There are a few million such articles indexed.
An observation is that for the REST client, establishing the connection is very fast but the search time is slower than the Java API.
The timings are as follows:
For the REST client, the average is 5s, 90th percentile is 10s
For the Java API, the average is 4s and 90th percentile is 6s
Given that REST Client is the way forward, are there settings or parameters which the developer could use to let the REST client return results faster?
Will the Elastic team also continue to work on the REST client API as well?
I would also like to know more about the test setup.
My theoretical expectations of 15-20% performance degradation for the Java HTTP client in comparison to the Java native client match your results, but I would like to write my own test program to verify this.
If it's not possible to open the source of the test program, could you describe what components you use (beside "Load Runner", probably HP LoadRunner?) and how much "a few million articles" are, what queries you use, and what cluster size you use with what JVMs?
From your description I do not understand how you measure "search time" or "timings". When benchmarking Elasticsarch, I distinguish between three related metrics:
took
service time
latency
In order not to repeat myself, I'm basing this description on the Rally docs:
took is the time needed by Elasticsearch to process a request. As it is determined on the server, it can neither include the time it took the client to send the data to Elasticsearch nor the time it took Elasticsearch to send it to the client. This time is captured by service time, i.e. it is the time period from the start of a request (on the client) until it has received the response.
The explanation of latency is a bit more involved. Imagine you want to grab a coffee on your way to work. You make this decision independently of all the other people going to the coffee shop so it is possible that you need to wait before you can tell the barista which coffee you want. The time it takes the barista to make your coffee is the service time. The service time is independent of the number of customers in the coffee shop. However, you as a customer also care about the length of the waiting line which depends on the number of customers in the coffee shop. The time it takes between you entering the coffee shop and taking your first sip of coffee is latency.
I am not sure whether Load Runner is able to measure latency accurately or if it is actually only measuring service time. Anyway, based on the numbers you present I have the impression that the system is completely saturated:
You are issuing 350 requests per second (7 queries * 50 users = 350 queries per second). At this rate, I'd expect a non-saturated system to respond in the worst (not average!) case in (1 / 7) s = 142 ms. However, you are already reporting average response times of 4 to 5 seconds. You should check took in the responses. If it is significantly lower than - say - 4 seconds, this suggests that your query spends the majority of time in the search queue (i.e. the waiting line) which indicates that you have overloaded the system. If took is indeed in this range, the same rationale applies: Your target throughput was too large to begin with.
From my perspective you can do one of two things:
Reduce your target throughput so that you do not bring the system into saturation. You also would not operate it in this mode in production.
Increase the system's capacity to actually handle the load.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.