Hey all! I've been trying to nail down where I can improve performance of my Elasticsearch cluster, and I've been specifically attempting to find out more about what is causing some high overhead in my end-to-end request.
I'm using Elasticsearch 6.4.2 and my client is Elasticsearch.Net 6.3.1 (NEST in .NET). I have my cluster deployed as a GKE deployment with six nodes. I have 11 indices, each with a single shard and 5 replicas. None of my indices are larger than about 4GB.
I'm capturing end-to-end request latency as well as the took parameter that Elastic reports back. Took is regularly only 50% of the end-to-end request latency. End-to-end is roughly 600ms on average. Even when I query using size: 0, I'm still seeing that overhead. I've done a pod-to-pod request within my Kubernetes cluster and found that the network latency is roughly 15-40ms, so I've ruled out network latency.
My NEST queries are quite long due to the fact that I'm doing a complex faceted multi-match query across my 11 indices. This requires a bunch of aggregations, filters, and post-filters. Additionally, I'm including a highlighter and score functions. An average NEST query turns out to about 800 lines of JSON.
Is this overhead really only due to serialization? Is there any way I can pare this down?