Filter aggregation buckets by top hits scripted field

Hello everyone. Let me start by explaining the end goal of the query I'm trying to build. We have a document with the following (relative to the question) mapping:

{
  "analysis-results" : {
    "mappings" : {
      "properties" : {
        "content_id" : {
          "type" : "keyword"
        },
        "event_date" : {
          "type" : "date"
        },
        "object_type" : {
          "type" : "keyword"
        },
        "reviews" : {
          "type" : "nested",
          "properties" : {
            "date" : {
              "type" : "date"
            },
            "label" : {
              "type" : "boolean"
            }
          }
        }
      }
    }
  }
}

So documents have a pair of content_id and object_type, however this pair may have many versions. What I want to do is from all the latest versions of documents (meaning that the document has the latest event_date of all other documents that share the same content_id and object_type) that haven't been reviewed, i.e. there are no entries for reviews, I need to get the earliest one (has the earliest event_date). This is the current query I have:

POST analysis-results/_search
{
  "aggs": {
    "my_buckets": {
      "composite": {
        "sources": [
          {
            "content_id": {
              "terms": {
                "field": "content_id"
              }
            }
          }, {
            "object_type": {
              "terms": {
                "field": "object_type"
              }
            }
          }
        ]
      },
      "aggs": {
        "by_event_date": {
          "top_hits": {
            "script_fields": {
               "is_reviewed": {
                  "script": {
                    "source": "params['_source']['reviews'] != null && params['_source']['reviews'].size() > 0"
                  }
                },
                "_source": {
                  "script": {
                    "source": "params['_source']"
                  }
                }
            },
            "size": 1,
            "sort": [
              {
                "event_date": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "top_date": {
          "max": {
            "field": "event_date"
          }
        },
        "mySort": {
          "bucket_sort": {
            "sort": [
              {
                "top_date": {
                  "order": "asc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

I've started by aggregating based on content_id and object_type and for each bucket, I get the most recent version of that document. However, I would like to also filter which buckets are returned based on whether that document's reviews field is null or an empty array. I've tried implementing it with a scripted field in the top_hits clause and then use the following:

"reviews_filter": {
  "bucket_selector": {
    "buckets_path": {
      "isReviewed": "by_event_date.top_hits.script_fields.is_reviewed"
    },
  "script": "!params.isReviewed"
 }
}

in order to preserve the buckets that haven't been reviewed. However, adding this causes an error where it can't find an aggregation find.

My question is how can I filter the buckets based on a field from the top document from top_hits? I also tried a few things with top_metrics as well, but couldn't get it to work either. I think it might have something to do with the fact that reviews is a nested field, since it always returned null, even for documents with reviews. So if you can think of a better solution that doesn't involve top_hits, I'm also open to any suggestions. I'm quite new to Elasticsearch so feel free to ask for any additional information you may need. Thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.