Faceting memory issue ElasticSearch 0.17.6 vs Solr 3.3

I have performed various test to benchmark faceting performances on ElasticSearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
ElasticSearch is using 1 shard and 0 replicas to be in a comparable scenario as Solr.

Indexing performance is higher on ElasticSearch 4X times faster than Solr. Indexing a batch of 1000 docs and sending a commit at end on Solr.

Searching perfomance still very good on both systems also during indexing and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

  • Solr is able to faceting using 2M docs on many fields using less than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow (4-17 secs faster at beginning slower when collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no indexing.
  • ElasticSearch is unable to finish a faceting on one single field (I choosed a field with less different values) after eating all 5G of ram he starts displying out of memory errors.

Solr lucene index on disk is 4.7G, ElasticSearch is 3.9G.

To do real facet searching the query used for testing is returning 80% of records into the collection.
A match_all query + faceting on single field in ElasticSearch generates OOM errors.

It sounds really strange to me that ElasticSearch needs all that RAM to do caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in 10 sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is used.
A fresh restarted ElasticSearch is able to complete the test query using 19 sec but using 40% or RAM. Every run of the same query is 19 sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing faceting what performances do you have?

Dario Rigolin
drigolin@gmail.com

What type of facets are you executing in elasticsearch and in Solr?

On Mon, Aug 29, 2011 at 8:49 PM, Dario Rigolin drigolin@gmail.com wrote:

I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.

Indexing performance is higher on Elasticsearch 4X times faster than Solr.
Indexing a batch of 1000 docs and sending a commit at end on Solr.

Searching perfomance still very good on both systems also during indexing
and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

  • Solr is able to faceting using 2M docs on many fields using less than 1G
    of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow
    (4-17 secs faster at beginning slower when collection is bigger) during
    indexing . Very fast (0.2-1.5 secs) on no indexing.
  • Elasticsearch is unable to finish a faceting on one single field (I
    choosed a field with less different values) after eating all 5G of ram he
    starts displying out of memory errors.

Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.

To do real facet searching the query used for testing is returning 80% of
records into the collection.
A match_all query + faceting on single field in Elasticsearch generates OOM
errors.

It sounds really strange to me that Elasticsearch needs all that RAM to do
caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in 10
sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is
used.
A fresh restarted Elasticsearch is able to complete the test query using 19
sec but using 40% or RAM. Every run of the same query is 19 sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing faceting
what performances do you have?

Dario Rigolin
drigolin@gmail.com

On Aug 29, 8:06 pm, Shay Banon kim...@gmail.com wrote:

What type of facets are you executing in elasticsearch and in Solr?

String facet. On language and countries codes.
I'm indexing bibliographic records.

Can you post some sample docs that you index into elasticsearch? How many
possible values do you have for the field you facet on? Can it have more
than one value per doc?

On Mon, Aug 29, 2011 at 11:19 PM, Dario Rigolin drigolin@gmail.com wrote:

On Aug 29, 8:06 pm, Shay Banon kim...@gmail.com wrote:

What type of facets are you executing in elasticsearch and in Solr?

String facet. On language and countries codes.
I'm indexing bibliographic records.

Interesting test. Although I'm not sure what causes you the trouble
for the faceting part as my setup with over 6 mio docs (only small
tweets with only ~10 to 20 fields) has good performance and memory
usage is ok (for ES). And only to make it secure: you are using term
facets in ES and in solr you are using the standard facet parameter
and not facet.method=enum or something, right?

Regarding indexing performance: you can use the streaming option for
Solr which should speed up things. also increasing the batch size to
e.g. 5000 should help. Although Solr will be always have the problem
(in the current versions) because it always requires a costly "commit"
but 4X times faster seems a bit high...

Now regarding your different index size: do you use stored fields for
Solr and ES? do you use the "all" field in ES? Did you optimize both
indices after indexing?

Regards,
Peter.

--

http://jetsli.de - News 4 Geeks

Hi Dario

As I've said to you on IRC, it would be really helpful to have a small
recreation of this issue.

thanks

Clint

On Mon, 2011-08-29 at 19:52 +0200, Dario Rigolin wrote:

I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.

Indexing performance is higher on Elasticsearch 4X times faster than
Solr. Indexing a batch of 1000 docs and sending a commit at end on
Solr.

Searching perfomance still very good on both systems also during
indexing and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

  • Solr is able to faceting using 2M docs on many fields using less
    than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting
    performace is slow (4-17 secs faster at beginning slower when
    collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no
    indexing.
  • Elasticsearch is unable to finish a faceting on one single field (I
    choosed a field with less different values) after eating all 5G of ram
    he starts displying out of memory errors.

Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.

To do real facet searching the query used for testing is returning 80%
of records into the collection.
A match_all query + faceting on single field in Elasticsearch
generates OOM errors.

It sounds really strange to me that Elasticsearch needs all that RAM
to do caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in
10 sec. Every query after the first one is very fast 0.6 sec and 6% of
RAM is used.
A fresh restarted Elasticsearch is able to complete the test query
using 19 sec but using 40% or RAM. Every run of the same query is 19
sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing
faceting what performances do you have?

Dario Rigolin
drigolin@gmail.com

memory usage is ok (for ES).

Do you generally find Solr has lower memory usage than ES in faceting?

I didn't measured something as the indexing process changed too when
moving from Solr to Elasticsearch, but my feeling was that memory
usage has not changed and performance got a bit better - but again:
just feelings ...

Peter.

--

http://jetsli.de - News 4 Geeks

On 31 Aug., 22:01, Andy selforgani...@gmail.com wrote:

memory usage is ok (for ES).

Do you generally find Solr has lower memory usage than ES in faceting?