Metric aggregation on terms aggregation result set

alexfritz · December 18, 2019, 10:25pm

Hello

maybe I don't understand Elasticsearch queries, but I have a problem.
I have time series documents to catch events (@timestamp = event timestamp). These events contain fields like: startDate, endDate and duration. Some of these documents can have duplicates which I want to filter.

One document looks like this:
"_id": "mEF1Gm8BFjRRFbh91iLc", "_source": { "@timestamp": 1576676119000, "startDate": "2019-12-18T13:20:00Z", "name": "test1", "duration": 919, "endDate": "2019-12-18T13:35:19Z" },

When I use terms aggregation on the startDate field, I will get all distinct values for this field. How can I get the result set with several fields for the whole distinct documents?

My goal is to get all distinct documents and perform metric aggregations on the result set, like sum up all duration fields.
Is this even possible?

xeraa · December 26, 2019, 12:57pm

Sounds like a Multi-field terms aggregation — either use a script (slower) or copy_to (needs to be set up with the mapping, won't backfill existing documents).

Within that aggregation you can then run another one to calculate the next step.

alexfritz · December 28, 2019, 2:03pm

No, I don't think it's multi-field terms aggregation. I am looking for a solution to get all document fields based on a terms aggregation.
If I am using copy_to, I won't get the original fields in the result set to apply the aggregation on, right?

In SQL it would be something like:
SELECT DISTINCT startDate, duration, name, endDate FROM table;

xeraa · December 28, 2019, 2:30pm

If you want the actual distinct documents, you set the top level size to whatever is the limit of documents you want to get back at most. Be sure to set it on the top level of the JSON document and not inside the terms aggregation because that has a different purpose.

GET /_search
{
   "size": 1000, // Here
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "product",
                "size" : 1200 // This is the calculation per shard, not the returned docs
            }
        }
    }
}

alexfritz · December 29, 2019, 4:22pm

This is not working. The result will be all documents (not the distinct). I think I might have found a solution by using a top_hits aggregation. Is there a way to do an aggregation on a top_hits result set?

{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2019-12-18T10:00:00.000+01:00",
        "lt": "2019-12-18T20:00:00.000+01:00",
        "time_zone": "+01:00"
      }
    }
  },
  "aggs": {
    "buckets": {
      "terms": {
        "field": "startDate"
      },
      "aggs": {
        "chartdata": {
          "top_hits": {
            "_source": {
              "includes": [
                "startDate",
                "name",
                "duration",
                "endDate"
              ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

system · January 26, 2020, 4:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to add a field to all document in an index which the field is an aggregation result? Elasticsearch	4	583	September 26, 2019
Aggregation on most recent document in a group Elasticsearch	5	900	May 22, 2018
Multi Field Aggregation Elasticsearch	6	1028	July 6, 2017
How do I first do a timerange filter on date field then extract distinct elements of another field? Elasticsearch	2	352	June 3, 2019
Filter last X documents based on date, then do a metric aggregation Elasticsearch	4	538	March 21, 2022

Metric aggregation on terms aggregation result set

Related topics