Filter aggregation buckets by top hits scripted field

mouse_clicker · June 7, 2021, 5:03pm

Hello everyone. Let me start by explaining the end goal of the query I'm trying to build. We have a document with the following (relative to the question) mapping:

{
  "analysis-results" : {
    "mappings" : {
      "properties" : {
        "content_id" : {
          "type" : "keyword"
        },
        "event_date" : {
          "type" : "date"
        },
        "object_type" : {
          "type" : "keyword"
        },
        "reviews" : {
          "type" : "nested",
          "properties" : {
            "date" : {
              "type" : "date"
            },
            "label" : {
              "type" : "boolean"
            }
          }
        }
      }
    }
  }
}

So documents have a pair of content_id and object_type, however this pair may have many versions. What I want to do is from all the latest versions of documents (meaning that the document has the latest event_date of all other documents that share the same content_id and object_type) that haven't been reviewed, i.e. there are no entries for reviews, I need to get the earliest one (has the earliest event_date). This is the current query I have:

POST analysis-results/_search
{
  "aggs": {
    "my_buckets": {
      "composite": {
        "sources": [
          {
            "content_id": {
              "terms": {
                "field": "content_id"
              }
            }
          }, {
            "object_type": {
              "terms": {
                "field": "object_type"
              }
            }
          }
        ]
      },
      "aggs": {
        "by_event_date": {
          "top_hits": {
            "script_fields": {
               "is_reviewed": {
                  "script": {
                    "source": "params['_source']['reviews'] != null && params['_source']['reviews'].size() > 0"
                  }
                },
                "_source": {
                  "script": {
                    "source": "params['_source']"
                  }
                }
            },
            "size": 1,
            "sort": [
              {
                "event_date": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "top_date": {
          "max": {
            "field": "event_date"
          }
        },
        "mySort": {
          "bucket_sort": {
            "sort": [
              {
                "top_date": {
                  "order": "asc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

I've started by aggregating based on content_id and object_type and for each bucket, I get the most recent version of that document. However, I would like to also filter which buckets are returned based on whether that document's reviews field is null or an empty array. I've tried implementing it with a scripted field in the top_hits clause and then use the following:

"reviews_filter": {
  "bucket_selector": {
    "buckets_path": {
      "isReviewed": "by_event_date.top_hits.script_fields.is_reviewed"
    },
  "script": "!params.isReviewed"
 }
}

in order to preserve the buckets that haven't been reviewed. However, adding this causes an error where it can't find an aggregation find.

My question is how can I filter the buckets based on a field from the top document from top_hits? I also tried a few things with top_metrics as well, but couldn't get it to work either. I think it might have something to do with the fact that reviews is a nested field, since it always returned null, even for documents with reviews. So if you can think of a better solution that doesn't involve top_hits, I'm also open to any suggestions. I'm quite new to Elasticsearch so feel free to ask for any additional information you may need. Thanks in advance!

system · July 5, 2021, 5:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter buckets after top_hits aggregation Elasticsearch	12	1573	June 22, 2020
Filter after aggregation Elasticsearch	3	793	September 17, 2019
What should the bucket path be with a top hits Elasticsearch	1	237	April 28, 2023
Find servers whose last logged event was "error" Elasticsearch	8	855	July 5, 2017
Merging buckets after top_hits Elasticsearch	1	74	April 30, 2024

Filter aggregation buckets by top hits scripted field

Related topics