When running some of our msearch queries, the response time is usually around 500ms. However about 20% of the time the response time skyrockets to anywhere between 2000ms to 13000ms.
The index has about 120GB data, 20 shards, 1 replica, spread over 8 data nodes.
I've tried looking at Profile API results but I couldn't really decipher what's wrong. It's also extremely long for being over 20 shards.
I've tried simplifying my query to just a term filter on 1 field, but the high results still occasionally appear.
The cluster is a prod cluster, so I've tried to hit a dev one and the results were always really fast. Could be the extra traffic and we are going to try to simulate that on dev.
Has anyone experienced similar things or have any idea on what we can try?