Aggregate the total sum of aggregation field (bucket_path)


(Nativ At) #1

Hi all, I'm trying with no luck so far, to aggregate the total impressions sum of impressions field, but I keep getting an error. I got the following query:

GET smarttag-2016.06.28.*/_search?search_type=count
{
  "query": {
    "bool": {
      "must": [{
        "range": {
          "@timestamp": {
            "gte": "2016-06-28T10:00:00",
            "lt": "2016-06-28T11:00:00"
          }
        }
      }],
      "must_not": [
        {
          "term": {
            "tagType": {
              "value": "app"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "TagId": {
      "terms": {
        "field": "TagId",
        "size": 0
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "url",
            "size": 0
          },
          "aggs": {
            "tagType": {
              "terms": {
                "field": "type"
              },
              "aggs": {
                "impressions": {
                  "sum": {
                    "field": "imp"
                  }
                }
              }
            }
          } 
        }
      }
    },
    "sum_imp": {
      "sum_bucket": {
          "buckets_path": "TagId>name>tagType>impressions"
          }
      }
  }
}

The error:

   {
       "error": {
          "root_cause": [],
          "type": "reduce_search_phase_exception",
          "reason": "[reduce] ",
          "phase": "query",
          "grouped": true,
          "failed_shards": [],
          "caused_by": {
             "type": "aggregation_execution_exception",
             "reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]"
          }
       },
       "status": 503
    } 

I don't understand what am I doing wrong.


(Colin Goodheart-Smithe) #2

Pipeline aggregations cannot sum over multiple nestings of aggregations. The sum_bucket aggregation should be used to sum a metric immediately under a sibling bucket aggregation. If you are summing the impressions over all matching documents why not use the sum metric aggregation instead of the sum_bucket pipeline aggregation?

GET smarttag-2016.06.28.*/_search?search_type=count
{
  "query": {
    "bool": {
      "must": [{
        "range": {
          "@timestamp": {
            "gte": "2016-06-28T10:00:00",
            "lt": "2016-06-28T11:00:00"
          }
        }
      }],
      "must_not": [
        {
          "term": {
            "tagType": {
              "value": "app"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "sum_imp": {
      "sum": {
          "field": "imp"
          }
      }
  }
}

(Nativ At) #3

I'm getting an error when I try to place it like that:

If I try to place it before the aggs, I'm getting:


(Colin Goodheart-Smithe) #4

Can you paste the complete request you are trying?


(Nativ At) #5
 GET smart-2016.06.29.*/_search?search_type=count
{
  "query": {
    "bool": {
      "must": [{
        "range": {
          "@timestamp": {
            "gte": "2016-06-28T10:00:00",
            "lt": "2016-06-29T11:00:00"
          }
        }
      }],
      "must_not": [
        {
          "term": {
            "tagType": {
              "value": "vast-inapp"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "cTagId": {
      "terms": {
        "field": "cTagId",
        "size": 0
      },
      "aggs": {
        "mTagId": {
          "terms": {
            "field": "mTagId",
            "size": 0
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "url",
            "size": 0
          },
          "aggs": {
            "tagType": {
              "terms": {
                "field": "tagType"
              },
              "aggs": {
                "impressions": {
                  "sum": {
                    "field": "impressions"
                      }
                    }
                  }
                }
              } 
            }
          }
        }
      }
    }
  },
    "aggs": {
    "sum_impressions": {
      "sum": {
          "field": "impressions"
          }
      }
  }
}

(Colin Goodheart-Smithe) #6

You need to remove the repeated aggs object in your request so it is as the below request. Also you should not use size: 0 on the terms aggregation especially on high cardinality fields (such as your url field) dues to the reasons detailed in this issue: https://github.com/elastic/elasticsearch/issues/18838

GET smart-2016.06.29.*/_search?search_type=count
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "2016-06-28T10:00:00",
              "lt": "2016-06-29T11:00:00"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "tagType": {
              "value": "vast-inapp"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "cTagId": {
      "terms": {
        "field": "cTagId",
        "size": 0
      },
      "aggs": {
        "mTagId": {
          "terms": {
            "field": "mTagId",
            "size": 0
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "url",
                "size": 0
              },
              "aggs": {
                "tagType": {
                  "terms": {
                    "field": "tagType"
                  },
                  "aggs": {
                    "impressions": {
                      "sum": {
                        "field": "impressions"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "sum_impressions": {
      "sum": {
        "field": "impressions"
      }
    }
  }
}

(Nativ At) #7

When I remove the 'aggs' object I'm getting:


(Colin Goodheart-Smithe) #8

I've updated my previous post to correct the placement of braces.


(Nativ At) #9

Thank you! It's working now!


(Ruslan Didyk) #10

Hi all, I'm getting same error when I'm trying to use Pipeline Aggregation with 'terms' aggregation deeper that one level.
Here is my simple mappings: (I use Elasticsearch 5.1.1)

PUT author
{
    "mappings": {
        "author" : {
            "properties": {
                "name" : {
                    "type": "text",
                    "fields": {
                         "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                         }                         
                    }                    
                },
                "gender" : {
                    "type": "text",
                    "fields": {
                         "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                         }
                    }
                },
                "age" : {"type": "integer"}
            }   
        },
        "book" : {
            "properties": {
                "title" : {
                    "type": "text",
                    "fields": {
                         "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                         }
                    }
                },
                "pages" : {"type" : "integer"}
            },
            "_parent" : {"type": "author"}
        }
    }
}

Query below with one 'terms' aggregation works fine:

GET author/author/_search

{
    "size": 0,
    "aggs" : {
        "gender" : {
            "terms" : {
                "field" : "gender.keyword"
            },
            "aggs" : {
                "avg" : {
                    "avg" : {
                        "field" : "age"
                    }
                }
            }
        },
        "max" : {
            "max_bucket" : {
                "buckets_path" : "gender>avg"
            }
        }
    }
}

But when query has two or more 'terms' in fails with an error. Query is bellow:

GET author/author/_search

{
   "size":0,
   "aggs":{
      "gender":{
         "terms":{
            "field":"gender.keyword"
         },
         "aggs":{
            "age":{
               "terms":{
                  "field":"age"
               },
               "aggs":{
                  "avg":{
                     "avg":{
                        "field":"age"
                     }
                  }
               }
            }
         }
      },
      "max":{
         "max_bucket":{
            "buckets_path":"gender>age>avg"
         }
      }
   }
}

Error message:

{
   "error": {
      "root_cause": [],
      "type": "reduce_search_phase_exception",
      "reason": "[reduce] ",
      "phase": "fetch",
      "grouped": true,
      "failed_shards": [],
      "caused_by": {
         "type": "aggregation_execution_exception",
         "reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]"
      }
   },
   "status": 503
}

Do you have any ideas how to solve this issue? Or Pipeline Aggregations do not support this case? Thank's a lot for any suggestions.


(Mark Harwood) #11

This looks like a different scenario - please open a different issue where we can discuss it.
It seems an odd request because the terms for "age" will only ever produce buckets that contain a single value so computing the average of this single value (and then the max of these) seems an odd request.
Either way - open a seperate topic please if you want to discuss further.


(Ruslan Didyk) #12

Thank you. I'll create a separate topic for this scenario.


(system) #13