One-to-many relationships with recursive relationships

Hi.

Im doing a pilot for introducing ES as the search-engine in our
product.

The main entity to index is "content", which is pretty straight
forward to index.
The problems arises when having to deal with content-placement
(categories, menu-items), security and other meta-data.

Lets take categories as an example. A content is resident in one
category only. Categories are arranged as trees.

A typical query could be like: "Fetch all content in a category X and
include all sub-categories"

Lets say the category-tree could be e.g 10 levels deep, so we
potentionally will get a large number of categories to consider.

Things to consider:

  1. Response-time
  2. Update data, e.g move a category

Approach 1: Include category-id in content index, resolve all
categories before query and apply them as Filter.
Pros: Easy to maintain
Cons: Potentially very large filters, lets say like 1000 category-keys
-> how bad is that?

Approach 2: Store category path in content, apply query with main
category and number of levels to include
Pros: Simple query
Cons: Hard to maintain e.g when moving a category

Approach 3: I then though about normalizing the data, storing content
with category-id, and the store category with path. But I not quite
shure how to map and query this? My initial thought was to use a child-
parent approach, with an "inner query" finding matching categories and
joining them with all matching content.category-ids, but Im not even
sure this is possible.

Any thoughts on how to approach this? Or any other approaches?

Hi Runar,

I know this is sort of old, but I didn't see a reply. Were you able to
determine what would be the best solution for this problem? We have a
similar issue where we are trying to query all sub-folders of the current
folder (in a file system index).

Thanks,

Sky

On Friday, October 28, 2011 2:34:23 PM UTC-7, Runar Myklebust wrote:

Hi.

Im doing a pilot for introducing ES as the search-engine in our
product.

The main entity to index is "content", which is pretty straight
forward to index.
The problems arises when having to deal with content-placement
(categories, menu-items), security and other meta-data.

Lets take categories as an example. A content is resident in one
category only. Categories are arranged as trees.

A typical query could be like: "Fetch all content in a category X and
include all sub-categories"

Lets say the category-tree could be e.g 10 levels deep, so we
potentionally will get a large number of categories to consider.

Things to consider:

  1. Response-time
  2. Update data, e.g move a category

Approach 1: Include category-id in content index, resolve all
categories before query and apply them as Filter.
Pros: Easy to maintain
Cons: Potentially very large filters, lets say like 1000 category-keys
-> how bad is that?

Approach 2: Store category path in content, apply query with main
category and number of levels to include
Pros: Simple query
Cons: Hard to maintain e.g when moving a category

Approach 3: I then though about normalizing the data, storing content
with category-id, and the store category with path. But I not quite
shure how to map and query this? My initial thought was to use a child-
parent approach, with an "inner query" finding matching categories and
joining them with all matching content.category-ids, but Im not even
sure this is possible.

Any thoughts on how to approach this? Or any other approaches?

--