We have an Elasticsearch cluster setup of 5 data nodes, 3 master with version 5.2.2. The cluster has 10 indices, namely txn_0, txn_1 ... txn_9 where Ith index has all transactions of a user with modulus of 10 as I. For example: All data of user Id 101 will be in index txn_1. Also, we are using shard routing to further optimise the routing of data to a particular shard.
Our query pattern includes search on data for a particular user. Recently, we were trying to optimise our query latency and we figured out that performance of terms query with single value was worse than the performance of term query. Be default, all our queries were terms query and when we changed to term, we saw massive improvement in query latency. We were not still quite sure of this behaviour after research, hence I am writing to the community to decipher this behaviour.
In this respect, I understand that term queries are not cached by default and terms query are (depending on frequency of queries & segment size).
Below is our query:
Old Query - https://gist.github.com/mkathuria/f54c8228876ee4891850ee1e529fdaee
New Query - https://gist.github.com/mkathuria/749df8d6e3b2560bd26b4f953a82231a
The only change in query was to change a particular terms query to term. This particular field txnType is low cardinality field with cardinality of around 100, but each value of txnType will have few millions to 1 billion hits each.