Should filters be used for fields with high cardinality?

Hi all,

Should Elasticsearch query filters be applied to fields with high cardinality?

From what I understand, filters are not scored, and they are cached.

However, I’m wondering if it’s better to cache only filters on low-cardinality fields. Would caching filters for high-cardinality fields cause unnecessary memory usage or potentially reduce performance/limit the benefit?

It depends ™️

If your field is something like an ID peculiar to a user (eg a security filter added to narrow documents to a user’s content) then use the “preference” routing option to ensure users requests go back to the same replica with a hopefully warm cache.

Many high cardinality fields will have a Zipf-like distribution (very many low frequency terms but a few highly popular terms). The high popularity terms will benefit from being cached.

However, this is speculation and only testing will give you the true answer for your particular environment.

1 Like