I need some advice on how to diagnose slow Elasticseach queries.
- 2 node cluster in ElasticCloud (1 primary shard, 1 replica). note: ElasticCloud = no slowlog.
- Interacting with cluster via my Azure .NET Web App, using the NEST library
- Most response times for my web server are 50-80ms
- All query times in ES (e.g
took) are < 5ms.
- Network latency between my web server and ElasticCloud is about 15ms
Sometimes, the response times jump between 100-200ms, but the
took is still 1ms. I was able to replicate this behaviour on local too (using ElasticSearch docker).
Here's a trace from Fiddler i captured, which is the call to Elasticsearch from my app:
ClientConnected: 17:28:44.325 ClientBeginRequest: 17:34:34.953 GotRequestHeaders: 17:34:34.953 ClientDoneRequest: 17:34:34.953 Determine Gateway: 0ms DNS Lookup: 0ms TCP/IP Connect: 0ms HTTPS Handshake: 0ms ServerConnected: 17:34:27.538 FiddlerBeginRequest: 17:34:34.953 ServerGotRequest: 17:34:34.953 ServerBeginResponse: 17:34:35.171 GotResponseHeaders: 17:34:35.171 ServerDoneResponse: 17:34:35.172 ClientBeginResponse: 17:34:35.172 ClientDoneResponse: 17:34:35.178
So, above is saying the Elasticsearch cluster took 218ms to process the request. However the
took is 1ms.
How can i track this slow request down? Clearly it's not the query speed (since took is low), so it must be something in the cluster.