Aggregation over aggregation on another field + top_hits

ion.nistor · May 31, 2016, 8:19am

Hi!

I’m trying to perform an aggregation over a filtered set of documents; the filter specifics, however, require an aggregation to be used (most recent ‘test’ for each ‘applicant’). The top level aggregation would be done on a field of the document, but on a different field than the one that filtering aggregation was performed.

Given the following set of documents:

{ "test": 1, "applicant":1, "score":90, “topic”:”geometry”},
{ "test": 2, "applicant":2, "score":65, “topic”:”physics” },
{ "test": 3, "applicant":2, "score":88, "topic”:”geometry”},
{ "test": 4, "applicant":1, "score":23, "topic”:”english” },
{ "test": 5, "applicant”:3, "score”:50, "topic”:”physics” },
{ "test": 6, "applicant”:3, "score”:77, "topic”:”english” }

We’re interested in finding out how many users have their highest score in each category.

In other words, we want to:

Filter only the highest scoring tests for each user
Group (and count) the results based on topic.

So, for step 1, we should only remain with:

{ "test": 1, "applicant":1, "score":90, “topic”:”geometry” },
{ "test": 3, "applicant":2, "score":88, "topic”:”geometry” },
{ "test": 5, "applicant”:3, "score”:50, "topic”:”physics”  },
{ "test": 6, "applicant”:3, "score”:77, "topic”:”english”  }

and for step 2, group count them by topic:

{“topic”:”geometry” , “count”: 2}
{“topic”:”physics”  , “count”: 1}
{“topic”:”english”  , “count”: 1}

The trouble is, if I use aggregation/top_hits for filtering :

{
  "aggs": {
    “applicants”: {
      "terms": {
        "field": “applicant”,
        "order" : { “highest_score" : "desc" }
      },
      "aggs": {
        “highest_score": { “max”: { "field": "score" }},
        “highest_score_top_hits": {
          "top_hits": {
            "size":1,
            "sort": [{"score": {"order": "desc"}}]
          }
        }
      }
    }
  }
}

I got the first step right (top_hits), but if I add a ‘parent’ aggregation by ‘topic’, the top_hits aggregation will no longer work properly, since ‘applicants’ will be mixed between different ‘topic’ buckets, thus aggregation on the max score will be incorrect.

It looks like the best approach would be to use a query filter before the ‘topic’ aggregation, but I was unable to create such a filter so that it only preserves the highest scoring test for each applicant.

I've also posted this question on stack overflow, but got no ideas yet

I'd appreciate a hint.

Thanks,
ion

xwang12345 · June 12, 2017, 3:48am

I come to stumbled on your question, Not sure I can understand your description about the problem for step 1

{ "test": 1, "applicant":1, "score":90, “topic”:”geometry” },
{ "test": 3, "applicant":2, "score":88, "topic”:”geometry” },
{ "test": 5, "applicant”:3, "score”:50, "topic”:”physics” },
{ "test": 6, "applicant”:3, "score”:77, "topic”:”english” }

Not sure why is applicant 3's physics score 50 there?

Topic		Replies	Views
How to use aggregations? Elasticsearch	2	330	July 6, 2017
Aggregation over top hits Elasticsearch	1	183	July 23, 2022
Aggregation over top N hits Elasticsearch	2	535	September 9, 2018
Limit results by field + aggregations over results Elasticsearch	1	409	December 25, 2019
How to aggregate over or access top_hits results in elasticsearch? Elasticsearch	1	446	October 26, 2017

Aggregation over aggregation on another field + top_hits

Related topics