I need some advice on how to diagnose slow Elasticseach queries.
Setup
2 node cluster in ElasticCloud (1 primary shard, 1 replica). note: ElasticCloud = no slowlog.
Interacting with cluster via my Azure .NET Web App, using the NEST library
Behaviour
Most response times for my web server are 50-80ms
All query times in ES (e.g took) are < 5ms.
Network latency between my web server and ElasticCloud is about 15ms
Problem
Sometimes, the response times jump between 100-200ms, but the took is still 1ms. I was able to replicate this behaviour on local too (using ElasticSearch docker).
Here's a trace from Fiddler i captured, which is the call to Elasticsearch from my app:
the took timer only starts to tick once the search is ready to be processed so I wonder whether in your case you have a lot of requests queued up. As you mention, the slowlog would not be of any use because once the query is ready to be processed, it's executed quickly. I would start by checking the node stats API and watch the thread pools' queue lengths.
I assume this is the same stat as the one you're wanting from the node stats API? Also - that API would only ever give me a 'point in time' representation... which is hard to get during a load test, moreso since the 'slowness' only appears to occur for a few seconds.
So - does the StackOverflow pics i posted help in any way? or is there more stats i should provide/look into?
Many thanks again. Been stumped on this one for a while, so glad someone is helping me out
I had a look at the charts but nothing stands out immediately: GC is fine, also the search thread pool's queue never spikes although you might be fooled by the sampling frequency, I'd still double-check the raw numbers via the node stats API as I've suggested, especially the thread pool statistics. If that does not reveal anything I'd take a closer look at the network between your client and Elasticsearch.
I don't believe it's the network. Proof of that is the Fiddler trace I captured. You can see full trace in Stack overflow question, but the pertinent part is this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.