Faceting memory issue ElasticSearch 0.17.6 vs Solr 3.3

drigolin · August 29, 2011, 5:52pm

I have performed various test to benchmark faceting performances on ElasticSearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
ElasticSearch is using 1 shard and 0 replicas to be in a comparable scenario as Solr.

Indexing performance is higher on ElasticSearch 4X times faster than Solr. Indexing a batch of 1000 docs and sending a commit at end on Solr.

Searching perfomance still very good on both systems also during indexing and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

Solr is able to faceting using 2M docs on many fields using less than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow (4-17 secs faster at beginning slower when collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no indexing.
ElasticSearch is unable to finish a faceting on one single field (I choosed a field with less different values) after eating all 5G of ram he starts displying out of memory errors.

Solr lucene index on disk is 4.7G, ElasticSearch is 3.9G.

To do real facet searching the query used for testing is returning 80% of records into the collection.
A match_all query + faceting on single field in ElasticSearch generates OOM errors.

It sounds really strange to me that ElasticSearch needs all that RAM to do caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in 10 sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is used.
A fresh restarted ElasticSearch is able to complete the test query using 19 sec but using 40% or RAM. Every run of the same query is 19 sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing faceting what performances do you have?

Dario Rigolin
drigolin@gmail.com

kimchy · August 29, 2011, 6:06pm

What type of facets are you executing in elasticsearch and in Solr?

On Mon, Aug 29, 2011 at 8:49 PM, Dario Rigolin drigolin@gmail.com wrote:

I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.

Indexing performance is higher on Elasticsearch 4X times faster than Solr.
Indexing a batch of 1000 docs and sending a commit at end on Solr.

Searching perfomance still very good on both systems also during indexing
and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

Solr is able to faceting using 2M docs on many fields using less than 1G
of ram (PS output shows 7% mem on a 8G PC). Faceting performace is slow
(4-17 secs faster at beginning slower when collection is bigger) during
indexing . Very fast (0.2-1.5 secs) on no indexing.

Elasticsearch is unable to finish a faceting on one single field (I
choosed a field with less different values) after eating all 5G of ram he
starts displying out of memory errors.

Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.

To do real facet searching the query used for testing is returning 80% of
records into the collection.
A match_all query + faceting on single field in Elasticsearch generates OOM
errors.

It sounds really strange to me that Elasticsearch needs all that RAM to do
caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in 10
sec. Every query after the first one is very fast 0.6 sec and 6% of RAM is
used.
A fresh restarted Elasticsearch is able to complete the test query using 19
sec but using 40% or RAM. Every run of the same query is 19 sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing faceting
what performances do you have?

Dario Rigolin
drigolin@gmail.com

drigolin · August 29, 2011, 8:19pm

On Aug 29, 8:06 pm, Shay Banon kim...@gmail.com wrote:

What type of facets are you executing in elasticsearch and in Solr?

String facet. On language and countries codes.
I'm indexing bibliographic records.

kimchy · August 30, 2011, 11:06am

Can you post some sample docs that you index into elasticsearch? How many
possible values do you have for the field you facet on? Can it have more
than one value per doc?

On Mon, Aug 29, 2011 at 11:19 PM, Dario Rigolin drigolin@gmail.com wrote:

On Aug 29, 8:06 pm, Shay Banon kim...@gmail.com wrote:

What type of facets are you executing in elasticsearch and in Solr?

String facet. On language and countries codes.
I'm indexing bibliographic records.

Karussell1 · August 31, 2011, 8:06am

Interesting test. Although I'm not sure what causes you the trouble
for the faceting part as my setup with over 6 mio docs (only small
tweets with only ~10 to 20 fields) has good performance and memory
usage is ok (for ES). And only to make it secure: you are using term
facets in ES and in solr you are using the standard facet parameter
and not facet.method=enum or something, right?

Regarding indexing performance: you can use the streaming option for
Solr which should speed up things. also increasing the batch size to
e.g. 5000 should help. Although Solr will be always have the problem
(in the current versions) because it always requires a costly "commit"
but 4X times faster seems a bit high...

Now regarding your different index size: do you use stored fields for
Solr and ES? do you use the "all" field in ES? Did you optimize both
indices after indexing?

Regards,
Peter.

--

http://jetsli.de - News 4 Geeks

Clinton_Gormley · August 31, 2011, 8:17am

Hi Dario

As I've said to you on IRC, it would be really helpful to have a small
recreation of this issue.

thanks

Clint

On Mon, 2011-08-29 at 19:52 +0200, Dario Rigolin wrote:

I have performed various test to benchmark faceting performances on
Elasticsearch vs Solr.
I have a test set of 2M documents with many hundreds fields.
Elasticsearch is using 1 shard and 0 replicas to be in a comparable
scenario as Solr.

Indexing performance is higher on Elasticsearch 4X times faster than
Solr. Indexing a batch of 1000 docs and sending a commit at end on
Solr.

Searching perfomance still very good on both systems also during
indexing and during no indexing.

Both jvm was limiter to 5G or RAM.

Doing faceting we discover that:

Solr is able to faceting using 2M docs on many fields using less
than 1G of ram (PS output shows 7% mem on a 8G PC). Faceting
performace is slow (4-17 secs faster at beginning slower when
collection is bigger) during indexing . Very fast (0.2-1.5 secs) on no
indexing.

Elasticsearch is unable to finish a faceting on one single field (I
choosed a field with less different values) after eating all 5G of ram
he starts displying out of memory errors.

Solr lucene index on disk is 4.7G, Elasticsearch is 3.9G.

To do real facet searching the query used for testing is returning 80%
of records into the collection.
A match_all query + faceting on single field in Elasticsearch
generates OOM errors.

It sounds really strange to me that Elasticsearch needs all that RAM
to do caching for faceting.

A fresh restarted Solr is able to complete the first faceting query in
10 sec. Every query after the first one is very fast 0.6 sec and 6% of
RAM is used.
A fresh restarted Elasticsearch is able to complete the test query
using 19 sec but using 40% or RAM. Every run of the same query is 19
sec long.

Analyzer are the same on both configurations.

Any suggestions? Or hints? Anybody with large collection doing
faceting what performances do you have?

Dario Rigolin
drigolin@gmail.com

Andy_2 · August 31, 2011, 8:01pm

memory usage is ok (for ES).

Do you generally find Solr has lower memory usage than ES in faceting?

Karussell1 · August 31, 2011, 9:30pm

I didn't measured something as the indexing process changed too when
moving from Solr to Elasticsearch, but my feeling was that memory
usage has not changed and performance got a bit better - but again:
just feelings ...

Peter.

--

http://jetsli.de - News 4 Geeks

On 31 Aug., 22:01, Andy selforgani...@gmail.com wrote:

memory usage is ok (for ES).

Do you generally find Solr has lower memory usage than ES in faceting?

Topic		Replies	Views
More on Solr vs ES faceting Elasticsearch	32	2314	July 6, 2017
Detail-questions on ES features Elasticsearch	12	408	July 6, 2017
Experiencing issues varying from excessive heap usage to losing whole shards (0.90.7) Elasticsearch	1	336	July 6, 2017
Facet performance going bad on large indexes Elasticsearch	6	1187	July 6, 2017
ElasticSearch Slowdown Elasticsearch	2	301	July 6, 2017

Faceting memory issue ElasticSearch 0.17.6 vs Solr 3.3

Related topics