Thanks Jorg, Chris for quick reply. Let me explain you the situation which
will clear what I am trying to accomplish. I am not giving the exact domain
example but a similar example in different domain.
Scenario: Lets say, I am storing a list of restaurant reviews from all over
the web. Each review document can have following fields:
review_id (long)
review_ratings (object)
aspect_1_name (string) : rating (float)
aspect_2_name (string) : rating (float)
aspect_3_name (string) : rating (float)
...
date (datetime)
Requirement: The goal is to be able to calculate facets on "review aspects"
and get avg rating given for each aspect across all reviews within a given
period. In this case, aspects can be like "Visual appeal of dish", "taste",
"smell" etc. Hypothetically, lets assume number of aspects can increase to
20k aspects. Note that, a single review can have maybe a dozen of aspects
defined but when you talk about millions of reviews over a period, they
might have collectively thousands of different aspects associated. So, here
our mapping will become huge because of "review_ratings" object.
Now, to achieve this, I can use histogram facet over Field Date and Value
as the Aspect Name. To get facets over, say 100 aspects, I can create 100
facets, one for each aspect. So, at a time, I will just be querying around
100 aspects at a time to get their avg rating.
Now that you know a sample scenario, can you guys tell me if my approach is
correct? or I am doing something fundamentally wrong ?
Thanks a lot again for help guys!
Vinay
On Mon, Nov 12, 2012 at 12:32 AM, Jörg Prante joergprante@gmail.com wrote:
It's nearly impossible to manage 20k fields. The reasons are: each field
consumes around some MB of resident memory in Lucene, each field mapping
creation causes cluster blocks and proliferation of the mapping settings
throughout the cluster nodes. Even if you manage to get 20k field created,
constructing a query over 20k fields and the lookup time for each field
settings and the mappings will eat up your performance.
Rule of thumbs: Facets are designed to perform well over a small number of
fields with a high cardinality of values. They do not perform well over a
high number of fields with low cardinality.
I would also be curious to learn about the scenario in which such a high
number of fields is required.
Jörg
On Saturday, November 10, 2012 11:12:51 PM UTC+1, revdev wrote:
Hi,
I am using dynamic templates and I am dealing with with couple thousand
such dynamic fields. Each of these dynamic fields are objects have 2 to 3
subfields which are of type "byte". My question is , is there any
performance penalty of having a large mapping? Right now I have couple
thousand fields in mapping but in future it can increase to maybe 10k-20k
fields. Will I see performance degradation with large mapping file. If so,
what things will be effected? FYI, I am planning to use facets over these
fields.
Thanks!
--
--