Derive index with reversed nesting

I have an index of products with a (denormalized) m:n relationship to categories.
My goal is to derive a categories index from it which contains the same information, but with the relationship reversed.

What I have

# My source index
PUT /products
{
    "mappings": {
        "properties": {
            "name": {
                "type": "keyword"
            },
            "categories": {
                "type": "nested",
                "properties": {
                    "cat_name": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

# .. with many documents like this:
POST /products/_doc
{
  "name": "radio",
  "categories": [
    { "cat_name": "audio" },
    { "cat_name": "electronics" }
  ]
}

POST /products/_doc
{
  "name": "fridge",
  "categories": [
    { "cat_name": "appliances" },
    { "cat_name": "electronics" }
  ]
}

What I would like to create from that

PUT /categories
{
    "mappings": {
        "properties": {
            "name": {
                "type": "keyword"
            },
            "products": {
                "type": "nested",
                "properties": {
                    "prod_name": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

{
  "name": "appliances",
  "products": [
    { "prod_name": "fridge" }
  ]
}

{
  "name": "audio",
  "products": [
    { "prod_name": "radio" }
  ]
}

{
  "name": "electronics",
  "products": [
    { "prod_name": "fridge" },
    { "prod_name": "radio" }
  ]
}

How do I query the products so they are grouped by category but still contain all the documents (products) of each group?
In case it matters, products actually have many more fields and I also want to see them in the categories index.

Also I'm not sure what's the best way to fill this new index. I imagine this can be done using an in-cluster transformation or by querying/composing programmatically.

Hey,

while you could always run a self written python script to 'invert' that relationship & data into a new index - my first thought around this is, that this is usually done, because the data cannot be queried the way it should be. If so, maybe you can add some more context around how data is queried? Having a second index usually means quite a bit more work of keeping it in sync, and I'd always try to not do that if possible. Maybe you can add some more context how you want to be able to query your data.

Thanks!

--Alex

The goal is to have two views: one where a product is shown with its categories, another which shows the products of each category.

Keeping both indexes is not a problem here because we write both of them as new indexes, then switch an alias to use the new ones.

However, I get your point of trying to use one index for both cases. My takeaway from a previous project was that it's often better to create one index per "view" but there is probably a fine line between the two approaches.

Follow up question: What makes those views so unique, that they cannot be fed from the same index with different queries/filters/aggregations? :slight_smile:

I am aware that there are use-cases where this is not possible, but I'm still mainly curious (not critisizing your approach or anything).

Thanks for your help Alex! Your questions absolutely make sense.

Products and categories have many fields (the example code is simplified).
In the product view I only need the category name and ID, in the category view I also only need a handful of properties for each product.

Since I already have the complete category data in memory at indexing time it would be easy to look them up by ID when creating the category-based index. The alternative would be to repeat the complete category data in each product that belongs to the category. I know redundant data is okay but it appeared a bit extreme to me here.

Indeed full denormalization looks extreme, but in any case you would need to have a lifecycle mechanism to either update both indices at the same time or your denormalized documents.

Another alternative might be the join datatype, see Join field type | Elasticsearch Guide [8.1] | Elastic

1 Like