How to make Average sub aggregation to consider parent aggregation count instead of current aggregation count?

I am using ES 7.10.2

Mappings for fields are as follows:

{
  "properties": {
    "ItemInfo": {
      "type": "nested",
      "include_in_parent": true,
      "properties": {
        "ShipInfo": {
          "type": "nested",
          "include_in_parent": true,
          "properties": {
            "City": {
              "type": "keyword",
              "store": true
            }
          }
        },
        "OriginalAmount": {
          "type": "double",
          "store": true
        }
      }
    }
  }
}

Then I indexed the document where some document have "OriginalAmount" while others not.

Now when I query ES for aggregation, where I want to calculate Avg of OriginalAmount for each city, as below:-

POST myindex/_search
  {
  "from": 0,
  "size": 0,
  "query": {
    "match_all": {
      "boost": 1
    }
  },
  "aggregations": {
    "City": {
      "terms": {
        "field": "ItemInfo.ShipInfo.City",
        "size": 10
      },
      "aggregations": {
        "OriginalAmount": {
          "avg": {
            "field": "OriginalAmount"
          }
        }
      }
    }
  }
}

The result I get is :-

"buckets" : [
        {
          "key" : "Littleton",
          "doc_count" : 5,
          "OriginalAmount" : {
            "value" : 76.98666666666666
          }
        },
        {
          "key" : "Paris",
          "doc_count" : 1,
          "OriginalAmount" : {
            "value" : null
          }
        }
      ]

As you can see for document with City "Paris" the avg I am getting is null, as that document is not indexed with OriginalAmount.

Is there any property on ES or any configuration within Aggregation, where it considers parent agg count instead of current aggregation count, so that instead of getting null I will get 0?

Reverse nested aggregation?

@Tomo_M Thanks for reply!! will check this out.

1 Like

@Tomo_M I think if I provide a missing configuration to my avg aggregation, then it will consider the parent aggregation count indirectly, like in above example for if I provide "missing":"0", then for document with city "Paris" which doesn't have OriginalAmount, it will be treated as 0, and then Avg will be calculated as Avg=sum/count=0/0, which will result in 0, instead of null.

Is this the right approach?

I suppose I misunderstood you were using nested aggregation and wanted to know how to aggregate out of the nested objects. I wonder why you need nested objects in this case.

You need reverse nested aggregation when include_in_parent is false:

PUT /test_reverse_nested_aggregation/
{
  "mappings": {
    "properties": {
      "ItemInfo": {
        "type": "nested",
        "include_in_parent": false,
        "properties": {
          "ShipInfo": {
            "type": "nested",
            "include_in_parent": false,
            "properties": {
              "City": {
                "type": "keyword",
                "store": true
              }
            }
          }
        }
      },
      "OriginalAmount": {
        "type": "double",
        "store": true
      }
    }
  }
}

POST /test_reverse_nested_aggregation/_doc
{
  "ItemInfo":[
    {
      "ShipInfo":[
        {"City": "Tokyo"}
      ]
    },
    {
      "ShipInfo":[
        {"City": "New York"},
        {"City": "London"}
      ]
    }
  ],
  "OriginalAmount":100
}

POST /test_reverse_nested_aggregation/_doc
{
  "ItemInfo":[
    {
      "ShipInfo":[
        {"City": "Paris"}
      ]
    }
  ]
}

POST test_reverse_nested_aggregation/_search
  {
  "from": 0,
  "size": 0,
  "query": {
    "match_all": {
      "boost": 1
    }
  },
  "aggregations": {
    "n":{
      "nested": {
        "path": "ItemInfo.ShipInfo"
      },
      "aggs":{
        "City":{
          "terms":{
            "field":"ItemInfo.ShipInfo.City"
          },
          "aggs":{
            "p":{
              "reverse_nested": {},
              "aggs":{
                "OriginalAmount":{
                  "avg":{
                    "field":"OriginalAmount",
                    "missing":0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

As you set "include_in_parent": true and used usual aggregations without nested aggregation, the average aggregation were just culculated on OriginalAmount Field for usual document level aggregation. No need to use "parent".

Anyway setting "missing":"0" is the right approach, I suppose. If you set "missing": "0", the sum will be 0 and the count should be doc_count (not 0). Then Avg = 0/doc_count = 0.

1 Like

@Tomo_M Thanks for all the help!!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.