We have an index with approximately 1 million documents each representing a product for an e-commerce store. We are using aggregations to calculate buckets representing attribute values for each attribute of the product. If we do a search, which limits the resultset to say 2000 products, performance is great (Elastic returns the result in less than 10 milliseconds). However, if we do a matchall query to get all products and their corresponding aggregations, elastic takes more than 3 seconds to return a result. If we disable the aggregations, performance is blazing again. Thus, it seems as though performance, when using aggregations, is very dependent the size of the resultset (With a matchall query we get around 1000 buckets). Is there anything special we need to be aware of in Elastic, in order to have a matchall query performing similarly to a query, which returns 2000 products?
Before we started using Elastic, we have worked with Lucene, and built our own facet abstraction on top of Lucene to be able to handle the above scenario. In this case, we pre-calculated facets on index startup and represented each facet as a term with a corresponding bitset. When doing searches, we retrieved a bitset for the query in question and “AND’ed” it with the precalculated bitset for each facet we wanted to show in the given scenario. With this implementation the speed at which we were able to calculate facet results did not differ depending on the size of the resultset of a given query. Only the number of documents and the number of facets influenced the speed. With this implementation, we were able to calculate more than 10.000 facets (buckets) at each search request with an index of 1 million documents and still get the resultset and facet results in less than 100 milliseconds.
Can anyone tell if this will be possible to achieve with Elastic and any pointers to what we are doing wrong (We are currently running tests on a setup at found.no with 1 cluster having 4 2,5Ghz cores and 1GB ram. The Elastic index takes up around 3,5GB of disk space