Metric aggregation on terms aggregation result set

Hello

maybe I don't understand Elasticsearch queries, but I have a problem.
I have time series documents to catch events (@timestamp = event timestamp). These events contain fields like: startDate, endDate and duration. Some of these documents can have duplicates which I want to filter.

One document looks like this:
"_id": "mEF1Gm8BFjRRFbh91iLc", "_source": { "@timestamp": 1576676119000, "startDate": "2019-12-18T13:20:00Z", "name": "test1", "duration": 919, "endDate": "2019-12-18T13:35:19Z" },

When I use terms aggregation on the startDate field, I will get all distinct values for this field. How can I get the result set with several fields for the whole distinct documents?

My goal is to get all distinct documents and perform metric aggregations on the result set, like sum up all duration fields.
Is this even possible?

Sounds like a Multi-field terms aggregation — either use a script (slower) or copy_to (needs to be set up with the mapping, won't backfill existing documents).

Within that aggregation you can then run another one to calculate the next step.

1 Like

No, I don't think it's multi-field terms aggregation. I am looking for a solution to get all document fields based on a terms aggregation.
If I am using copy_to, I won't get the original fields in the result set to apply the aggregation on, right?

In SQL it would be something like:
SELECT DISTINCT startDate, duration, name, endDate FROM table;

If you want the actual distinct documents, you set the top level size to whatever is the limit of documents you want to get back at most. Be sure to set it on the top level of the JSON document and not inside the terms aggregation because that has a different purpose.

GET /_search
{
   "size": 1000, // Here
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "product",
                "size" : 1200 // This is the calculation per shard, not the returned docs
            }
        }
    }
}

This is not working. The result will be all documents (not the distinct). I think I might have found a solution by using a top_hits aggregation. Is there a way to do an aggregation on a top_hits result set?

{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2019-12-18T10:00:00.000+01:00",
        "lt": "2019-12-18T20:00:00.000+01:00",
        "time_zone": "+01:00"
      }
    }
  },
  "aggs": {
    "buckets": {
      "terms": {
        "field": "startDate"
      },
      "aggs": {
        "chartdata": {
          "top_hits": {
            "_source": {
              "includes": [
                "startDate",
                "name",
                "duration",
                "endDate"
              ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.