Hi,
We're doing some ES performance testing with a relatively small index. All
is peachy until we want to facet on a field that has relatively high
cardinality - in this case it's a "tags" field that, as you can imagine,
has a high number of distinct values across all documents in the index.
So when we include faceting on tags in our queries performance sinks from
over 400 QPS to 20-30 QPS. The average latency jumps from 40 ms to 500 ms.
Is there anything in ES that one can use to improve performance in such
cases?
In Solr land there are 2 faceting methods, one of which is designed for "situations
where the number of indexed values for the field is high, but the number of
values per document is low":
Field Cache: If facet.method=fc then a field-cache approach will be used.
This is currently implemented using either the Lucene FieldCachehttp://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/search/FieldCache.html or
(starting in Solr 1.4) an UnInvertedField if the field either is
multi-valued or is tokenized (according toFieldTypehttp://wiki.apache.org/solr/FieldType.isTokened()).
Each document is looked up in the cache to see what terms/values it
contains, and a tally is incremented for each value. This is excellent for
situations where the number of indexed values for the field is high, but
the number of values per document is low. For multi-valued fields, a hybrid
approach is used that uses term filters from the filterCache for terms that
match many documents.
Source: http://wiki.apache.org/solr/SolrFacetingOverview
I didn't see anything like this in ES docs and I'm wondering if there is
room for improvement in ES faceting or....?
Thanks,
Otis
Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html