First, thanks to Elastic for the great software they create.
I am working with a product catalog search function that needs some changes to accomodate customer needs. The product in our setup can have variants, a variant is a product with certain attributes: Size, Color etc.
Shirt - White, Large
Shirt - Blue, Large
These are variants of the same product, some additional fields are unique to variants: price, weight, images, SKU and stock information.
Other fields: name, short description, long description, tags, labels, related products etc are all saved on the product level.
At this point we index products, and when doing that we also index what attributes are available for them. With this setup we can filter for example: "Large, Blue" and "In Stock". Right now that only tells us that the shown products can be bought as "Large, Blue" and there is some variant in stock. It doesnt check that the actual variant in "Large, Blue" is the one in stock.
To fix this we first looked at using variants as nested fields on the products, this did however end up creating some annoyingly complex queries in some cases and i have read it is not the best for performance. Especially since there are multiple prices as well being nested inside the variant.
Would a better approach be to save the variants as completely separate documents in the database with the shared product data present in all of them and stop indexing products? There would be a lot of data duplication, but i recon it could be a trade-off worth doing. It might be worth mentioning that we will also transition to using the data given by elasticsearch for product feeds (facebook, google) to offload our traditional database servers. These feeds are often one row per variant.
I am thankful for any valuable input on this.