Hi everyone,
I'm challenged with a problem where I use nested objects.
Given I have a document structure that looks like the following
{
"mappings": {
"properties": {
"id": {
"type": "integer",
"store": true
},
"sku": {
"type": "keyword",
"store": true
},
"name": {
"type": "text",
"store": true
},
"warehouses": {
"type": "nested",
"properties": {
"warehouse_id": {
"type": "integer"
},
"qty": {
"type": "integer"
},
"location_id": {
"type": "integer"
},
"isle_id": {
"type": "integer"
},
"row_id": {
"type": "integer"
}
}
}
}
}
}
Multiple products can be located in multiple warehouses plus be in multiple isles. We would like to have aggregations by warehouses, isles and rows so on our UI we can then see all available warehouses for each products and filter based on them
Let's assume my query returns 6000 hits totals and each of these hits has 6000 warehouses. So nested aggregation will then aggregate based on 36 million values, which becomes pretty slow, sometimes 10 seconds to load. We are aiming at 0.3-0.5 seconds load time. I don't think hardware will solve the problem. Any suggestions on how we can optimize the schema/query to fulfill this requirement?
Query I'm using:
{
"size": 30,
"_source": false,
"stored_fields": [
"*"
],
"query": {
"bool": {
"must": [
{
"match": {
"name": "Remanufactured"
}
}
]
}
},
"aggregations": {
"testing": {
"filter": {
"term": {
"warehouses.warehouse_id": "2435"
}
},
"aggs": {
"warehouses": {
"nested": {
"path": "warehouses"
},
"aggregations": {
"isle_id": {
"terms": {
"field": "warehouses.isle_id",
"size": 500
}
},
"row_id": {
"terms": {
"field": "warehouses.row_id",
"size": 500
}
},
}
}
}
}
}
}
We will have total 10 million products and total of 200k warehouses.
If we denormilize this into each separate record for each warehouse + product unique value, we will end up in over 100 billion records.