I have performed various test to benchmark faceting performances on ElasticSearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
ElasticSearch is using 1 shard and 0 replicas to be in a comparable scenario as Solr.
Indexing performance is higher on ElasticSearch 4X times faster than Solr. Indexing a batch of 1000 docs and sending a commit at end on Solr.
Searching perfomance still very good on both systems also during indexing and during no indexing.
Both jvm was limiter to 5G or RAM.
Doing faceting we discover that:
- Solr is able to faceting using 2M docs on many fields using less than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow (4-17 secs faster at beginning slower when collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no indexing.
- ElasticSearch is unable to finish a faceting on one single field (I choosed a field with less different values) after eating all 5G of ram he starts displying out of memory errors.
Solr lucene index on disk is 4.7G, ElasticSearch is 3.9G.
To do real facet searching the query used for testing is returning 80% of records into the collection.
A match_all query + faceting on single field in ElasticSearch generates OOM errors.
It sounds really strange to me that ElasticSearch needs all that RAM to do caching for faceting.
A fresh restarted Solr is able to complete the first faceting query in 10 sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is used.
A fresh restarted ElasticSearch is able to complete the test query using 19 sec but using 40% or RAM. Every run of the same query is 19 sec long.
Analyzer are the same on both configurations.
Any suggestions? Or hints? Anybody with large collection doing faceting what performances do you have?