Question on REST Client and Java API

I am comparing the performance of Elasticsearch REST Client and the Java APIs using Load Runner (using a load of 50 users at a rate of 7 queries/s)

Just a quick background on the data indexed. They consist of news/wiki articles with typical metadata like date, author, tags etc. There are a few million such articles indexed.

An observation is that for the REST client, establishing the connection is very fast but the search time is slower than the Java API.

The timings are as follows:
For the REST client, the average is 5s, 90th percentile is 10s
For the Java API, the average is 4s and 90th percentile is 6s

Given that REST Client is the way forward, are there settings or parameters which the developer could use to let the REST client return results faster?

Will the Elastic team also continue to work on the REST client API as well?

Thank You

Can you share more details about yours tests, like the scenario used?

I'm sure @javanna will love to hear more.

I would also like to know more about the test setup.

My theoretical expectations of 15-20% performance degradation for the Java HTTP client in comparison to the Java native client match your results, but I would like to write my own test program to verify this.

If it's not possible to open the source of the test program, could you describe what components you use (beside "Load Runner", probably HP LoadRunner?) and how much "a few million articles" are, what queries you use, and what cluster size you use with what JVMs?

And did you also test bulk indexing?

I am sure @danielmitterdorfer is interested too as he ran some benchmarks a while ago.

Hi @Ong,

a while ago we benchmarked the REST client against the transport client (see https://www.elastic.co/blog/benchmarking-rest-client-transport-client for details and https://github.com/elastic/elasticsearch/tree/master/client/benchmark for the source code). Note that we used the low-level REST client because back then there was no high-level REST client.

From your description I do not understand how you measure "search time" or "timings". When benchmarking Elasticsarch, I distinguish between three related metrics:

  • took
  • service time
  • latency

In order not to repeat myself, I'm basing this description on the Rally docs:

took is the time needed by Elasticsearch to process a request. As it is determined on the server, it can neither include the time it took the client to send the data to Elasticsearch nor the time it took Elasticsearch to send it to the client. This time is captured by service time, i.e. it is the time period from the start of a request (on the client) until it has received the response.

The explanation of latency is a bit more involved. Imagine you want to grab a coffee on your way to work. You make this decision independently of all the other people going to the coffee shop so it is possible that you need to wait before you can tell the barista which coffee you want. The time it takes the barista to make your coffee is the service time. The service time is independent of the number of customers in the coffee shop. However, you as a customer also care about the length of the waiting line which depends on the number of customers in the coffee shop. The time it takes between you entering the coffee shop and taking your first sip of coffee is latency.

I am not sure whether Load Runner is able to measure latency accurately or if it is actually only measuring service time. Anyway, based on the numbers you present I have the impression that the system is completely saturated:

You are issuing 350 requests per second (7 queries * 50 users = 350 queries per second). At this rate, I'd expect a non-saturated system to respond in the worst (not average!) case in (1 / 7) s = 142 ms. However, you are already reporting average response times of 4 to 5 seconds. You should check took in the responses. If it is significantly lower than - say - 4 seconds, this suggests that your query spends the majority of time in the search queue (i.e. the waiting line) which indicates that you have overloaded the system. If took is indeed in this range, the same rationale applies: Your target throughput was too large to begin with.

From my perspective you can do one of two things:

  • Reduce your target throughput so that you do not bring the system into saturation. You also would not operate it in this mode in production.
  • Increase the system's capacity to actually handle the load.

You should also check out Relating Service Utilisation to Latency for more background; it's a very eye-opening article if you are new to queuing theory.

Daniel

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.