Data modeling: nested vs parent/child vs two indices


(dangnhdev) #1

Hi all, I'm facing a problem with "order - product" data
Denormalized data may look like:
A product:

{
    "_id" : productId,
    "name" : name,
    "price" : price,
    "desc" : description
}

An order:

{
    "_id" : orderId,
    "user" : userInfo,
    "products" : [
        {
            "_id" : productId1,
            "name" : name1,
            "price" : price1
        },
        {
            "_id" : productId2,
            "name" : name2,
            "price" : price2
        },
        {
            "_id" : productId3,
            "name" : name3,
            "price" : price3
        }
    ]
}
  • With this design we're having two problems:

    1. I often need to do aggregation query in nested document (product), and it is very slow (about 85% slower in our benchmark) compared to the similar query in the product index only. For example: Show the most popular product or show all product group by total revenue
  1. In many case products in the order must update frequently because we can't receive full order once at a time (business domain)
  • With the parent-child model, you know, I'm aware of join. I haven't explicitly tested it yet, how about grouping order-product by order or get full order-product in one document ?

  • We can create two indices for this: one for product and one for order (with full nested product) and additional data for aggregation too. But this require much of work and duplicate data too!

Can you give me some advice? Thank you very much!


(Mark Harwood) #2

I'm not sure the example comparison makes sense either as a raw benchmark or as alternative methods of deriving the same stat. "Total revenue" and "most popular" are derived from sales data with many records per unique product whereas the product catalogue should only have one record per product.

The ideal solution really depends on what questions you want to ask of the data.

If you aren't interested in answering "order-centric" questions e.g. "what products are commonly ordered with product X?" then your documents could just be separate line-item docs with a common "orderID" if the whole order arrives in stages. That would still allow you to answer product-centric questions like "most popular/highest revenue product".


(system) #3