We are trying to analyze an issue we have, where we occasionally get slow responses for a query that is usually quick.
Our queries are aggregating on a certain field entityId, which is a not-analyzed string value.
We run an aggregation query which executes a terms aggregation on that field (entityId), with a specified size.
We have noticed that while the index usually returns a reponse that takes ~10ms for that query, each time that we write to the index (indexing a new document, indexing an existing one, or deleting a doc), the next 2 queries are much slower... around 400ms. When profiling the queries we saw that those 2 slow queries return responses from two different set of shards, probably a distribution between primary shards and replicas.
We suspect that the write operation is causing the index to rebuild the data it needs in order to perform the aggregation, but don't know why should that happen.
Our cluster consists of 3 nodes, with 6 shards (+6 replicas), running on ES 2.3
Your help would be appreciated,