Aggregate data per document

Hi All,

I am wondering if the following is possible. I want to be able aggregate nested data within a document and then filter by the aggregated data.

So if we have

PUT warehouse/
{
  "mappings": {
  "properties": {
    "inventory": {
      "type": "nested",
      "properties": {
        "equipment": {
          "type": "keyword"
        },
        "price": {
          "type": "float"
        },
        "shopId": {
          "type": "keyword"
        }
      }
    },
    "profile": {
      "properties": {
        "name": {
          "type": "keyword"
        }
      }
    }
  }
  }
}

and then put data

PUT warehouse/_doc/1
{
  "profile": {
    "name": "Place1"
  },
  "inventory": [
    {"equipment":"guitar", "price": 1000.00, "shopId":"1"},
    {"equipment":"guitar", "price": 200.00, "shopId":"2"},
    {"equipment":"guitar", "price": 1.0, "shopId":"4"}
  ]
}

etc

I need to do filter by shopIds, say shopId 1 and shopId 2. Then aggregate that data per document, so for document above the average guitar price for shopId1 and shopId2 is 150.

I then want to only return documents and values that meed a criteria, so I say shopId1 + shopId2 AND average price >130.

I am able to get some aggregation working but it is aggregation across all the documents returned, not per document.

Hoping someone can help me!

JohnJoe

The following gives me the average price across all documents that match the search

GET warehouse/_search
{
  "query": {
    "nested": {
      "path": "inventory",
      "query": {
        "bool": {
              "should": [
                {
                  "term": {
                    "inventory.shopId": "1"
                  }
                },
                {
                  "term": {
                    "inventory.shopId": "2"
                  }
                }
              ]
            }
      }
    }
  },
  "aggs": {
    "inventory": {
      "nested": {
        "path": "inventory"
      },
      "aggs": {
        "priceAgg": {
          "filter": {
            "bool": {
              "should": [
                {
                  "term": {
                    "inventory.shopId": "1"
                  }
                },
                {
                  "term": {
                    "inventory.shopId": "2"
                  }
                }
              ]
            }
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "inventory.price"
              }
            }
          }
        }
      }
    }
  }
}

Result:

"aggregations" : {
    "inventory" : {
      "doc_count" : 9,
      "priceAgg" : {
        "doc_count" : 3,
        "avg_price" : {
          "value" : 2000.0
        }
      }
    }

But what I need is the price for the criteria per document

Hey @JohnJoe and welcome to the community!

In a nested structure, each child entry is its own document. In this way, it makes doing nested aggregations a little interesting.

I worry about the use of nested in this fashion if these lists are going to grow/change. Have you considered changing the index mapping to be "item" centric instead of the current "profile" centric?

This would give you an individual document for each inventory item and significantly more flexible queries for what you're looking to accomplish.

While it may seem more voluminous, under the hood, the number of documents will be similar, yet easier to manage and it will scale better as you grow the index.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.