Should Elasticsearch query filters be applied to fields with high cardinality?
From what I understand, filters are not scored, and they are cached.
However, I’m wondering if it’s better to cache only filters on low-cardinality fields. Would caching filters for high-cardinality fields cause unnecessary memory usage or potentially reduce performance/limit the benefit?
If your field is something like an ID peculiar to a user (eg a security filter added to narrow documents to a user’s content) then use the “preference” routing option to ensure users requests go back to the same replica with a hopefully warm cache.
Many high cardinality fields will have a Zipf-like distribution (very many low frequency terms but a few highly popular terms). The high popularity terms will benefit from being cached.
However, this is speculation and only testing will give you the true answer for your particular environment.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.