Aggregating Top hits results

Hi,
We have the following documents in our ES index:

{
component: "comp1",
failures: 0,
errors: 1,
success: 45,
eventTimestamp: "01-JAN-2018"
},
{
component: "comp2",
failures: 1,
errors: 1,
success: 33,
eventTimestamp: "01-JAN-2018"
},
{
component: "comp1",
failures: 0,
errors: 0,
success: 46,
eventTimestamp: "02-JAN-2018"
},
{
component: "comp2",
failures: 0,
errors: 0,
success: 35,
eventTimestamp: "02-JAN-2018"
}

What I want is the sum(top_hits(success)) group by component, week. In this case, I am looking for 81 for the week of Jan 01. So essentially get the top hit for each component for the week of Jan 01 and then do a sum of all the top hits results. Is this possible? If this is not directly possible in Elasticsearch currently, will it be supported in the future as a pipeline aggregation?

Thanks

It appears that your success field contains a running sum of success events up to the moment the event was registered.

Under the assumption that the success field increases monotonically for each component, the following approach works. For a given time interval

  • aggregate by component keyword
  • identify the maximum value of the success field
  • sum up the success values of each component bucket using a sum_bucket aggregation on the maximum values
DELETE test

PUT test

PUT test/_mapping/doc
{
  "properties": {
    "component": {
      "type": "keyword"
    },
    "errors": {
      "type": "long"
    },
    "eventTimestamp": {
      "type": "date"
    },
    "failures": {
      "type": "long"
    },
    "success": {
      "type": "long"
    }
  }
}

PUT test/doc/1
{
  "component": "comp1",
  "failures": 0,
  "errors": 1,
  "success": 45,
  "eventTimestamp": "2018-01-01T12:34:56"
}

PUT test/doc/2
{
  "component": "comp2",
  "failures": 1,
  "errors": 1,
  "success": 33,
  "eventTimestamp": "2018-01-02T01:23:45"
}

PUT test/doc/3
{
  "component": "comp1",
  "failures": 0,
  "errors": 0,
  "success": 46,
  "eventTimestamp": "2018-01-03T23:45:01"
}

PUT test/doc/4
{
  "component": "comp2",
  "failures": 0,
  "errors": 0,
  "success": 35,
  "eventTimestamp": "2018-01-04T11:22:33"
}
GET test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "eventTimestamp": {
            "gte": "2018-01-01",
            "lt": "2018-01-08"
          }
        }
      }
    }
  }, 
  "aggs": {
    "components": {
      "terms": {
        "field": "component",
        "size": 10,
        "order": {
          "max_success": "desc"
        }
      },
      "aggs": {
        "max_success": {
          "max": {
            "field": "success"
          }
        }
      }
    },
    "sum_success": {
      "sum_bucket": {
        "buckets_path": "components>max_success"
      }
    }
  }
}

If you really need to sum up the values of the latest component success values in the time interval, things get a bit more complicated. To extract the success values, first use a terms aggregation to break down the data set by component, but make sure that only the latest value for each component is retained (size 1, sort by descending eventTimestamp). then extract the success value into a separate pipeline aggregation (e.g. max_bucket), and finally sum up the extracted success values with a sum_buckets aggregation.

GET test/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "eventTimestamp": {
            "gte": "2018-01-01",
            "lt": "2018-01-08"
          }
        }
      }
    }
  },
  "aggs": {
    "components": {
      "terms": {
        "field": "component",
        "size": 100
      },
      "aggs": {
        "latest": {
          "terms": {
            "field": "eventTimestamp",
            "size": 1,
            "order": {
              "_key": "desc"
            }
          },
          "aggs": {
            "success": {
              "max": {
                "field": "success"
              }
            }
          }
        },
        "success": {
          "max_bucket": {
            "buckets_path": "latest>success"
          }
        }
      }
    },
    "sum_success": {
      "sum_bucket": {
        "buckets_path": "components>success"
      }
    }
  }
}

Thank you! The second scenario involving eventTimestamp is what my requirement is. I want the latest entry. This gives me the basis to work on. Will develop from here and will let you know how it turns out :slight_smile:
For example, I want to convert the top filter for eventTimestamp into date histogram aggregation of week since I want the sum of "latest values for each component per week" for a 6 month period.
As I said, this gives me a very good direction on developing the final query.
Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.