Modeling of data with parent-child relationship and fast query


#1

I'm modeling our data and trying to decide on the best pattern to facilitate fast queries.

I would be happy to get some input on best practice. I will try to explain the relationships and the type of queries we are performing.

The ES index is storing product data. Products belong to groups (i.e. products in the same group are variations of the same item).
Groups store one or more products (the variations of the products usually based on color, size, availability etc.). The number of products per group is not high ( usually <20).

Product groups store shared meta data about the products in the group.
Some meta data is calculated (once per day), such as "group score" while other meta data could be updated according to changes in products themselves in that group.

Products are also assigned a "product score" which is calculated periodically.

Products them selves can be updated multiple times a day (but usually "small" modifications. For instance each product has a "in_stock" or "quantity") field which can change more frequently.

Also some new products could be added to the group, or removed from the group. In other words we need to support "mutation" , "insert" , and "delete" of products in a group.

The queries that we are performing are "recommendation" of products based on various context attributes of the user (affinity , CF, bought together etc.). For instance we can search for products such that "color = red" and "brand = nike" and applying an affinity query for that user, the queries are boosted with different weights on the different attributes.

Query result are sorted by the group and product score , returning the best product of that group (which satisfies the query conditions i.e. the product color matches the query definition)

We need to facilitate paging, being able to fetch a certain number of items each time. Also we want to support faceting, so we need to be able to group by a certain attribute for instance items that satisfy the base query but where "color = blue" or "brand = nike" , and we need to know the count of how many items satisfy the facet.

Regarding using "nested documents" model, I want to understand the "penalty" opposed to "parent-child" model. I understand that ES creates "hidden" documents for the nestlings, and that they will sit on the same shard as the owning doc. Also, from my understanding, ES-6 deprecates multi mappings in the same index, so it seems the "parent-child" model is irrelevant moving forward. Is this the case? Should I use nested objects rather than parent-child to describe this type of relationship? Do parent-child have a place in ES assuming that the parent-child are of different "types"?

Regarding query behavior
Initially I did an aggregate query by "group id" and sorting by "group score" and taking best result in that group. This is extremely slow. I tried to profile it, seems that the aggregation portion is taking the majority of time. I also tried to use "index sorting" on group score (and also by "group id" or both) , but didn't get any improvement.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.