Nested object aggregation performance issues

brusko · February 8, 2021, 3:45pm

Hi everyone,
I'm challenged with a problem where I use nested objects.
Given I have a document structure that looks like the following

        {
  "mappings": {
    "properties": {
      "id": {
        "type": "integer",
        "store": true
      },
      "sku": {
        "type": "keyword",
        "store": true
      },
      "name": {
        "type": "text",
        "store": true
      },
      "warehouses": {
        "type": "nested",
        "properties": {
          "warehouse_id": {
            "type": "integer"
          },
          "qty": {
            "type": "integer"
          },
          "location_id": {
            "type": "integer"
          },
          "isle_id": {
            "type": "integer"
          },
          "row_id": {
            "type": "integer"
          }
        }
      }
    }
  }
}

Multiple products can be located in multiple warehouses plus be in multiple isles. We would like to have aggregations by warehouses, isles and rows so on our UI we can then see all available warehouses for each products and filter based on them

Let's assume my query returns 6000 hits totals and each of these hits has 6000 warehouses. So nested aggregation will then aggregate based on 36 million values, which becomes pretty slow, sometimes 10 seconds to load. We are aiming at 0.3-0.5 seconds load time. I don't think hardware will solve the problem. Any suggestions on how we can optimize the schema/query to fulfill this requirement?

Query I'm using:

    {
    "size": 30,
    "_source": false,
    "stored_fields": [
        "*"
    ],
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": "Remanufactured"
                    }
                }
            ]
        }
    },
    "aggregations": {
        "testing": {
            "filter": {
                "term": {
                    "warehouses.warehouse_id": "2435"
                }
            },
            "aggs": {
                "warehouses": {
                    "nested": {
                        "path": "warehouses"
                    },
                    "aggregations": {
                        
                                "isle_id": {
                                    "terms": {
                                        "field": "warehouses.isle_id",
                                        "size": 500
                                    }
                                },
                                "row_id": {
                                    "terms": {
                                        "field": "warehouses.row_id",
                                        "size": 500
                                    }
                                },
                            }
              
                }
            }
        }
    }
}

We will have total 10 million products and total of 200k warehouses.

If we denormilize this into each separate record for each warehouse + product unique value, we will end up in over 100 billion records.

system · March 8, 2021, 3:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance Issue with Sub-Aggregations and Nested Document Structure? Elasticsearch	3	2322	March 27, 2017
Performance on query with nested objects Elasticsearch	7	13546	June 18, 2019
Extremely slow nested aggregations, need suggestions on modeling/shards Elasticsearch	1	482	July 5, 2017
Aggregation on nested document always takes 2-3 seconds? Elasticsearch	2	527	July 6, 2017
Terms aggregation with nested objects performs poorly Elasticsearch	1	269	August 24, 2021

Nested object aggregation performance issues

Related topics