I have performed various test to benchmark faceting performances on ElasticSearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
ElasticSearch is using 1 shard and 0 replicas to be in a comparable scenario as Solr.
Indexing performance is higher on ElasticSearch 4X times faster than Solr. Indexing a batch of 1000 docs and sending a commit at end on Solr.
Searching perfomance still very good on both systems also during indexing and during no indexing.
Both jvm was limiter to 5G or RAM.
Doing faceting we discover that:
Solr is able to faceting using 2M docs on many fields using less than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow (4-17 secs faster at beginning slower when collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no indexing.
ElasticSearch is unable to finish a faceting on one single field (I choosed a field with less different values) after eating all 5G of ram he starts displying out of memory errors.
Solr lucene index on disk is 4.7G, ElasticSearch is 3.9G.
To do real facet searching the query used for testing is returning 80% of records into the collection.
A match_all query + faceting on single field in ElasticSearch generates OOM errors.
It sounds really strange to me that ElasticSearch needs all that RAM to do caching for faceting.
A fresh restarted Solr is able to complete the first faceting query in 10 sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is used.
A fresh restarted ElasticSearch is able to complete the test query using 19 sec but using 40% or RAM. Every run of the same query is 19 sec long.
Analyzer are the same on both configurations.
Any suggestions? Or hints? Anybody with large collection doing faceting what performances do you have?
What type of facets are you executing in elasticsearch and in Solr?
On Mon, Aug 29, 2011 at 8:49 PM, Dario Rigolin drigolin@gmail.com wrote:
I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.
Indexing performance is higher on Elasticsearch 4X times faster than Solr.
Indexing a batch of 1000 docs and sending a commit at end on Solr.
Searching perfomance still very good on both systems also during indexing
and during no indexing.
Both jvm was limiter to 5G or RAM.
Doing faceting we discover that:
Solr is able to faceting using 2M docs on many fields using less than 1G
of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow
(4-17 secs faster at beginning slower when collection is bigger) during
indexing . Very fast (0.2-1.5 secs) on no indexing.
Elasticsearch is unable to finish a faceting on one single field (I
choosed a field with less different values) after eating all 5G of ram he
starts displying out of memory errors.
Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.
To do real facet searching the query used for testing is returning 80% of
records into the collection.
A match_all query + faceting on single field in Elasticsearch generates OOM
errors.
It sounds really strange to me that Elasticsearch needs all that RAM to do
caching for faceting.
A fresh restarted Solr is able to complete the first faceting query in 10
sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is
used.
A fresh restarted Elasticsearch is able to complete the test query using 19
sec but using 40% or RAM. Every run of the same query is 19 sec long.
Analyzer are the same on both configurations.
Any suggestions? Or hints? Anybody with large collection doing faceting
what performances do you have?
Can you post some sample docs that you index into elasticsearch? How many
possible values do you have for the field you facet on? Can it have more
than one value per doc?
On Mon, Aug 29, 2011 at 11:19 PM, Dario Rigolin drigolin@gmail.com wrote:
Interesting test. Although I'm not sure what causes you the trouble
for the faceting part as my setup with over 6 mio docs (only small
tweets with only ~10 to 20 fields) has good performance and memory
usage is ok (for ES). And only to make it secure: you are using term
facets in ES and in solr you are using the standard facet parameter
and not facet.method=enum or something, right?
Regarding indexing performance: you can use the streaming option for
Solr which should speed up things. also increasing the batch size to
e.g. 5000 should help. Although Solr will be always have the problem
(in the current versions) because it always requires a costly "commit"
but 4X times faster seems a bit high...
Now regarding your different index size: do you use stored fields for
Solr and ES? do you use the "all" field in ES? Did you optimize both
indices after indexing?
As I've said to you on IRC, it would be really helpful to have a small
recreation of this issue.
thanks
Clint
On Mon, 2011-08-29 at 19:52 +0200, Dario Rigolin wrote:
I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.
Indexing performance is higher on Elasticsearch 4X times faster than
Solr. Indexing a batch of 1000 docs and sending a commit at end on
Solr.
Searching perfomance still very good on both systems also during
indexing and during no indexing.
Both jvm was limiter to 5G or RAM.
Doing faceting we discover that:
Solr is able to faceting using 2M docs on many fields using less
than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting
performace is slow (4-17 secs faster at beginning slower when
collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no
indexing.
Elasticsearch is unable to finish a faceting on one single field (I
choosed a field with less different values) after eating all 5G of ram
he starts displying out of memory errors.
Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.
To do real facet searching the query used for testing is returning 80%
of records into the collection.
A match_all query + faceting on single field in Elasticsearch
generates OOM errors.
It sounds really strange to me that Elasticsearch needs all that RAM
to do caching for faceting.
A fresh restarted Solr is able to complete the first faceting query in
10 sec. Every query after the first one is very fast 0.6 sec and 6% of
RAM is used.
A fresh restarted Elasticsearch is able to complete the test query
using 19 sec but using 40% or RAM. Every run of the same query is 19
sec long.
Analyzer are the same on both configurations.
Any suggestions? Or hints? Anybody with large collection doing
faceting what performances do you have?
I didn't measured something as the indexing process changed too when
moving from Solr to Elasticsearch, but my feeling was that memory
usage has not changed and performance got a bit better - but again:
just feelings ...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.