On Sep 7, 2011, at 8:40 PM, jprante wrote:
Hi Dario,
On Sep 7, 7:45 pm, Dario Rigolin drigo...@gmail.com wrote:
what I can quickly guess from that gist is
- you declare ten facets, all of those facets will "sum up" and slow
down the single ES node. Do you really need ten facets or do they
replicate same data? Will you need to present all of the ten facets to
the user at once? E.g. biblevel_full and class_desc look like
candidates for removal.
We usually need more than those 10.
We full index every unimarc subfield and we create sort and facet fields.
I cannot remove them. We also cannot use stop words because librarians need to find books with titles like "The and or not"...
Jörg you indexed 18M records on 3 ES nodes what's the speed of a facet query on author fields like match_all or "berlin"?
What's your nodes hw configurations?
- one shard is much too few if you have a multi core cpu, think about
offering "at least one shard per core", it's my rule of thumb. Then,
the facet computing resource consumption will spread over the cores
more easily.
using 5 shard I was running out of memory in my previous tests.
I can try to use 2 as CPU is a dual core.
More analysis is surely possible with some statistics about the facets
(result length, values, cardinalities), and the documents and queries
you use.
My tests was very simple but I was looking to have numbers about ES performance compared to SOLR.
I know that faceting on large sets is very memory and CPU intensive task and caching is a key point for have good performances.
I was expecting that ES was fast as SOLR doing faceting and looking at others good things ES is able to do we was planning to move from SOLR to ES in our OPAC application but faceting performance on medium recordsets (> 1.5M) make me thinking carefully.
ES scaling is very nice, I can add more nodes and performances increase (In SOLR this cannot be done so easy) but comparing a "single node" pure performance this make me thinking that:
- I need to know better how ES faceting works and how can be optimized.
- At the moment 11M records are handled easy by a SOLR single node with 8G RAM. If moving to ES means adding more HW and more RAM this is not a simple process for us.
Best regards,
Jörg
Dario Rigolin
drigolin@gmail.com