must_not does indeed not translate to
If you run a bool query with
must_not clauses, Lucene will first create an iterator that matches the
should clauses, and then if you have
must_not clauses, this iterator will be wrapped in order to exclude documents that match any of these
Let me take an example: you have 1M documents in your index, and 1000 of them contain
bar in the
foo field. If you want to find all documents that match
foo:bar, Lucene will just iterate over the postings list of
foo:bar and call the collector on it. So you would decode 1000 documents from your postings list and call the collector 1000 times. Now if you execute the same clause as a
must_not filter and have a
match_all query as a
must clause, Lucene will iterate over all documents matching the
match_all query, and for each of them check if they match
foo:bar and should be excluded. So you have to check 1M times if the document matches
foo:bar and call the collector 999000 times.
This is why when you have a boolean field, it is more efficient to encode
false explicitely instead of only building an index for
true and then searching for documents that have
false as a value by searching for documents that don't have
So essentially if you want fast must_nots it would probably be quicker to do that noting in application code and query elasticsearch with a must with a list of all possible values excluding the ones in must_nots?
This depends on how many possible values you are. If there are only a handful of them then this could help, but if there are thousands of possible values, this would not be an option.