I apologize in advance if I should have started new thread with this question, but it does have some correlation to the above. I also apologize for the long winded post, I just want to make sure I get the message across
Just a quick background on our solution here. We have built a Spring web app that sits on top of an ElasticSearch cluster that will allow the Search Business team to define all the facets, synonyms, query boosts, recommended products, etc, for the Search solution at their respective ecommerce site. For those of you that are familiar with FAST search, our Spring app is similar to their Business Manager Tool. So for facets, using the ES mappings API, we present to the user a list of all indices in the ES cluster, then based on their selection, we then present all the attributes/properties that are included in the selected indices, and then they ultimately pick what attribute/property to use for that particular facet. They then can do all the related stuff (i.e. if range, designate ranges, if term, how many values to initially display, etc). So we don’t necessarily have the “dynamic facet” issue that has been discussed here, but the challenge we are facing that is somewhat relevant to this discussion is how can we manage what “context” the facet should appear in. What I mean by that is, for a given facet, I only want it to be included if the customer is in the “Electronics” category, or I may want to slot it globally and return it with all queries. The problem is that over time, the business will potentially create hundreds of facets, and we cannot have every request to ES tagged with hundreds of facet filters. Even if performance was not an issue (and in most of our cases, performance is the only issue ), this will not work given the following scenario (and there are others like it).
The search business team will set up multiple price range facets, each intended to only be evaluated and returned given the context the customer may be in. For example, if the customer is in the TVs category, the price range facet may look like:
$0 - $200
$200 - $400
$400 - $600…
But if they are in the cables category, it might look like:
$0 - $5
$5 - $10
$10 - $15…
So the business will create two different facets with the corresponding ranges and then add the appropriate “category” context to it, and therein lies our challenge, how to implement this?
As jtreher mentioned above, if we only needed to support this within the customer’s browse experience (like Amazon), the solution would be rather straight forward, since we would always know what category context the customer is in at any point in time, and we could then extract those facets accordingly from our cache and include in our ES Query. However, the business requirement is that we also support this model when the customer issues a global/ad-hoc query. In this scenario, we are in a bit of a chicken/egg situation, in that we won’t know what contexts are included until we evaluate the initial results, and only then can we apply the appropriate facet filtering.
Again, for those of you that have worked with FAST and its ecommerce layer, you know that they support all of this out-of-the-box. I don’t know all the nitty, gritty details of how they have implemented their contextual capability, but what I do know and don’t like, is that they require a full re-index when new facets are created or their context is changed. So obviously they are storing some metadata with each document that indicates what facet/context it belongs to, and this is used at query time to derive facets. Therefore, search query performance is optimized, but at a pretty heavy cost with the full re-indexing.
That all being said, I am struggling to find the best solution with ES. The different options I am looking at now are:
-
Two-step query – Invoke global query with category facet to get all potential category contexts, then go to business managed facet cache and find all facets that match categories returned from initial global query and resubmit query again with all the appropriate facet filters. Pros – should work, Cons – very concerned with performance impact of this with both additional query load and latency.
-
Do something like what FAST does, and that is when business user creates facet with context in business tool, add some metadata to the corresponding documents in index and then leverage that metadata at query time to create necessary facets. This might be similar to the facet flattening solution described above. Pros- query performance should be better than option1, Cons – not clear on what would need to be implemented and if this would work with ES, don’t want to get into situation where re-indexing is required for every facet transaction.
-
Draw line in the sand and only support contextual faceting under “Browse” scenario.
-
Other solutions???
Thanks for any feedback and a big thanks to all that have contributed to ElasticSearch, it totally rocks!