I have following criteria in the query. The terms list for seen by can grow significantly large. There are also a couple of similar kind of list in "must_not" clause and those can be grow large too.
Hard to answer with exact numbers, the performance will be slower,
obviously, as the list grows. Though, what I suggest is to use terms filter
in this case with execution mode set to bool (new in 0.19: Elasticsearch Platform — Find real-time answers at scale | Elastic).
This means that the cached filter will be per term, and you will get a
considerably better filter cache hit ration in this case, meaning that for
common terms, it will be a complete in memory bitwise operations.
I have following criteria in the query. The terms list for seen by can
grow significantly large. There are also a couple of similar kind of list
in "must_not" clause and those can be grow large too.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.