To denormalize or not denormalize

Greetings! :slight_smile:

First, thanks to Elastic for the great software they create.

I am working with a product catalog search function that needs some changes to accomodate customer needs. The product in our setup can have variants, a variant is a product with certain attributes: Size, Color etc.

For example.

Shirt - White, Large
Shirt - Blue, Large

These are variants of the same product, some additional fields are unique to variants: price, weight, images, SKU and stock information.

Other fields: name, short description, long description, tags, labels, related products etc are all saved on the product level.

At this point we index products, and when doing that we also index what attributes are available for them. With this setup we can filter for example: "Large, Blue" and "In Stock". Right now that only tells us that the shown products can be bought as "Large, Blue" and there is some variant in stock. It doesnt check that the actual variant in "Large, Blue" is the one in stock.

To fix this we first looked at using variants as nested fields on the products, this did however end up creating some annoyingly complex queries in some cases and i have read it is not the best for performance. Especially since there are multiple prices as well being nested inside the variant.

Would a better approach be to save the variants as completely separate documents in the database with the shared product data present in all of them and stop indexing products? There would be a lot of data duplication, but i recon it could be a trade-off worth doing. It might be worth mentioning that we will also transition to using the data given by elasticsearch for product feeds (facebook, google) to offload our traditional database servers. These feeds are often one row per variant.

I am thankful for any valuable input on this.

Best regards

Hi Henrik,

I think you mean you have an array of objects, and you want the connection between the fields of this object inside the array. You can use the nested fields as you are already using, which will create some overhead.

You can also create one document for every item of the array. Or you can create a new field with the fields concatenate like this:

"Blue#Large", "Blue#Large#VNeck" - this technique only works if you have keywords and no need for the aggregation (how many combinations of Larges you have), and you can add the combination for every option you have in an array

We do recommend to denormalize always if possible, but you will need to test in order to know which technique is the best for your use case.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.