Most efficient way to model data in Elasticsearch

psilos · May 26, 2016, 1:00pm

I have an example of modelling an commence site. Say that the site has few hundreds shops and few millions products. The products per shop range: 1000-100.000 products/shop. I need to be able to aggregate the products and the shop fields. All the products and all the shops have the same schema.

Product

        {
          "productName"
          "price"
          "category"
        }

Shop

{
  "shopName"
  "rating"
}

Is it more efficient to have a) 1 index/shop, b) same index and 1 type/shop or c) same index, same type and have a field to determine the shop of the product?

I read some related articles and most of them are in favour of same index and 1 type/shop. But then they say that if there is one single index which has a large number of docs it might be even slower than having multiple indices.

I also need to perform JOINS and aggregations between the shops and the products. For example I need to be able to retrieve all the products from the shops with rating higher than 8/10 and also get the number of products per category. Is it preferable to use a) application-side JOIN, b) parent-child relationships, c) Siren plug-in, d) something else?

warkolm · May 27, 2016, 6:20am

Different indices.
Why do you think you need joins?

psilos · May 28, 2016, 9:30pm

Thnx for that. Can you please elaborate a bit more on why you think 1 index is better/faster?
Because first I need to get all the shops with rating above 8/10 and then join that with the products from these shops.

magnusbaeck · June 14, 2016, 5:08pm

ES doesn't support joins. You have to model your data differently, typically by de-normalizing it.

Topic		Replies	Views
How to model a Elasticsearch database for ecommerce Elasticsearch	4	6633	July 5, 2017
How to model data for time-based joins? Elasticsearch	2	471	February 28, 2018
My use case : Joining indices? Elasticsearch	5	345	July 28, 2022
Modeling product data with frequent updates Elasticsearch	2	501	September 27, 2022
Data modelling to avoid reprocessing Elasticsearch	2	369	February 9, 2022

Most efficient way to model data in Elasticsearch

Related topics