High Request Overhead Not Captured by Took

Hey all! I've been trying to nail down where I can improve performance of my Elasticsearch cluster, and I've been specifically attempting to find out more about what is causing some high overhead in my end-to-end request.

I'm using Elasticsearch 6.4.2 and my client is Elasticsearch.Net 6.3.1 (NEST in .NET). I have my cluster deployed as a GKE deployment with six nodes. I have 11 indices, each with a single shard and 5 replicas. None of my indices are larger than about 4GB.

I'm capturing end-to-end request latency as well as the took parameter that Elastic reports back. Took is regularly only 50% of the end-to-end request latency. End-to-end is roughly 600ms on average. Even when I query using size: 0, I'm still seeing that overhead. I've done a pod-to-pod request within my Kubernetes cluster and found that the network latency is roughly 15-40ms, so I've ruled out network latency.

My NEST queries are quite long due to the fact that I'm doing a complex faceted multi-match query across my 11 indices. This requires a bunch of aggregations, filters, and post-filters. Additionally, I'm including a highlighter and score functions. An average NEST query turns out to about 800 lines of JSON.

Is this overhead really only due to serialization? Is there any way I can pare this down?

Hey all, just wanted to include this chart that shows a breakdown in our end-to-end request latency so y'all can see what I'm describing. search-took is the took value that Elasticsearch reports back, whereas search-query-ms-excluding-took is purely the await ElasticClient.SearchAsync method minus the took value.

(search-handler-ms-excluding-query and search-latency-web-full are additional overhead where we're pulling data from our database, etc.)

You can see how the search-query-ms-excluding-took regularly exceeds took. What's going on?

Hi @Zach_Diemer,

I think the problem here is the combination of relatively high latency in your cluster and complicated queries. Assuming you have an average latency of 30ms and you're seeing 300ms, then 60ms of that is probably already explained by network latency (since there's quite some time between the server receiving the request and responding I'm estimating the latency as two separate network request approximately).
Sending 800 lines of JSON and deserialising def will take non-trivial time on the server so that likely is another chunk of your wall-time.
Also, you client will take non-trivial time to serialise those queries as well.
The same then goes for the query response (it has to be serialised on ES end and deserialised by the client).

One way of gaining some insight into the latency serialisation introduces into the end-to-end latency may be to record a few queries as JSON strings/files and running them via curl (or some other REST client) to isolate the time cost of serialising the query on the client?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.