Filter based on scripted field in an aggregation


(Nishant Kumar) #1

Here is my source:

{
  "size": 0, 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "datetime": {
              "gte": "2017-04-01T00:00:00.000Z",
              "lte": "2018-03-31T23:59:59.999Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "all_match": {
      "filters": {
        "filters": {
          "all": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job_num",
            "size": 200000
          },
          "aggs": {
            "latest_job": {
              "top_hits": { 
                "size": 1,
                "sort": [{"rec_date": "desc"}],
                "script_fields": {
                  "reqd_flag":{
                    "script": {
                      "lang": "painless",
                      "inline": "params['_source']['required_flag'] == 'Y' ? 0 : 1"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  
}

I need to filter out the records returned in 'latest_job' aggregation based on the value present in the field 'reqd_flag'.
NOTE: 'all_match' aggregation was created as a work around to apply bucket_path.
Any input / pointers / suggestions are appreciated.
Thank you in advance.

StackOverflow link

GitHub link


(Zachary Tong) #2

So you only want documents that have required_flag: 1 to be present in latest_job?

If that's the case, you just need to add a filter in the query, or in the all_match filter aggregation checking the required_flag field. No need to put it inside a script in top_hits


(Nishant Kumar) #3

Thank you for responding back @polyfractal, and my apologies for not reverting back earlier than now.
It is not the field 'required_flag' but the calculated field 'reqd_flag' that I need to filter on. It is just a sample condition that I have placed for evaluating 'reqd_flag' here in example but it will be a compound condition and based on the evaluated value, I want to filter the records.


(Zachary Tong) #4

I think you'll want to use the Script Query in that case. You still might be able to put that inside the main query, which is preferable. It will be much faster if you evaluate the script as part of the query condition, assuming you're ok with the query evaluating across all documents (and not a subset.

Alternatively, you could put the script query inside of a Filter Aggregation, which allows you to apply the aggregation to just a subset of the data. But do note that the filter aggregation will be much slower, since it can't take advantage of the optimizations that are applied at query time.

TopHits are only designed to show the "top" document in a bucket. You can't hook any logic or filtering to the the top hit, it's purely for display purposes.


(Nishant Kumar) #5

That gives me some perspective, but I would like to state my problem here in much simpler words:

STEP 1: Based on some conditions, fetch the list of records (all these filtered records have a field, say 'rec_id' which will be unique for them all. There will be multiple records for each 'rec_id' one for each day present in the field 'datetime')
STEP 2: Use the above obtained list of 'rec_ids' to obtain the latest record based on 'datetime' field'
STEP 3: Apply the filters applied in STEP 1 to filter out if any incorrect data in the latest set of records.

Any input / suggestions / queries are welcome here.


(Zachary Tong) #6

Gotcha... should be doable. Would it be possible to share a few sample documents and the criteria used to filter? I think it'd be easiest if I were to just write up a quick demonstration query, rather than trying to explain the steps.


(Nishant Kumar) #7

The above shared transform was a sample of it. There are a set of conditions that I need to add in global filter and the same needs to be applied once I get the latest record. Does that help ?


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.