Cross Index Filters


(Jan Fiedler) #1

A common requirement for searches in web shops is to filter products
based on some 'other' criteria that is not directly bound to the
product data life cycle. Examples for such 'other' data is stock level
information (most prominently) but also product ratings or prices. The
important thing here is that the update frequency (and source) for
this 'other' data is completely separated from the actual product
data.

I know that I could index this 'other' data along with the product
data to have it available for filtering. However, this would cause a
constant reindexing of unchanged data (the product data) with most
likely very negative consequences for cache efficiency.

A perfect solution would be to represent the different aspects of the
product data in different indexes (e.g. product index, availability
index, price index, rating index) and to be able to query the product
index with filters into the complementary indexes. Relationships
between the documents in the different indexes can be maintained by
the id field (i.e. a single product would have a document in multiple
indexes keyed by the same SKU).

Has anybody had the same problem and maybe a good solution for this?
Is this something that would make a useful feature for ES ?


(Shay Banon) #2

This is a very valid requirement, but sadly not easy to implement at all (in
an efficient manner). This type of "joins" are problematic in Lucene, though
I have some ideas on how to try and solve it (as efficient as possible). I
might try and tackle this for later version, certainly not 0.9... .

-shay.banon

On Tue, Jun 22, 2010 at 2:21 PM, Jan Fiedler fiedler.jan@gmail.com wrote:

A common requirement for searches in web shops is to filter products
based on some 'other' criteria that is not directly bound to the
product data life cycle. Examples for such 'other' data is stock level
information (most prominently) but also product ratings or prices. The
important thing here is that the update frequency (and source) for
this 'other' data is completely separated from the actual product
data.

I know that I could index this 'other' data along with the product
data to have it available for filtering. However, this would cause a
constant reindexing of unchanged data (the product data) with most
likely very negative consequences for cache efficiency.

A perfect solution would be to represent the different aspects of the
product data in different indexes (e.g. product index, availability
index, price index, rating index) and to be able to query the product
index with filters into the complementary indexes. Relationships
between the documents in the different indexes can be maintained by
the id field (i.e. a single product would have a document in multiple
indexes keyed by the same SKU).

Has anybody had the same problem and maybe a good solution for this?
Is this something that would make a useful feature for ES ?


(medcl) #3

hey,kimchy,does this feature available now?


(system) #4