Filter based on scripted field in an aggregation

nishantkumar · April 21, 2018, 5:30pm

Here is my source:

{
  "size": 0, 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "datetime": {
              "gte": "2017-04-01T00:00:00.000Z",
              "lte": "2018-03-31T23:59:59.999Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "all_match": {
      "filters": {
        "filters": {
          "all": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job_num",
            "size": 200000
          },
          "aggs": {
            "latest_job": {
              "top_hits": { 
                "size": 1,
                "sort": [{"rec_date": "desc"}],
                "script_fields": {
                  "reqd_flag":{
                    "script": {
                      "lang": "painless",
                      "inline": "params['_source']['required_flag'] == 'Y' ? 0 : 1"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  
}

I need to filter out the records returned in 'latest_job' aggregation based on the value present in the field 'reqd_flag'.
NOTE: 'all_match' aggregation was created as a work around to apply bucket_path.
Any input / pointers / suggestions are appreciated.
Thank you in advance.

StackOverflow link

GitHub link

polyfractal · April 27, 2018, 4:19pm

So you only want documents that have required_flag: 1 to be present in latest_job?

If that's the case, you just need to add a filter in the query, or in the all_match filter aggregation checking the required_flag field. No need to put it inside a script in top_hits

nishantkumar · May 14, 2018, 2:00pm

Thank you for responding back @polyfractal, and my apologies for not reverting back earlier than now.
It is not the field 'required_flag' but the calculated field 'reqd_flag' that I need to filter on. It is just a sample condition that I have placed for evaluating 'reqd_flag' here in example but it will be a compound condition and based on the evaluated value, I want to filter the records.

polyfractal · May 14, 2018, 2:06pm

I think you'll want to use the Script Query in that case. You still might be able to put that inside the main query, which is preferable. It will be much faster if you evaluate the script as part of the query condition, assuming you're ok with the query evaluating across all documents (and not a subset.

Alternatively, you could put the script query inside of a Filter Aggregation, which allows you to apply the aggregation to just a subset of the data. But do note that the filter aggregation will be much slower, since it can't take advantage of the optimizations that are applied at query time.

TopHits are only designed to show the "top" document in a bucket. You can't hook any logic or filtering to the the top hit, it's purely for display purposes.

nishantkumar · May 14, 2018, 2:29pm

That gives me some perspective, but I would like to state my problem here in much simpler words:

STEP 1: Based on some conditions, fetch the list of records (all these filtered records have a field, say 'rec_id' which will be unique for them all. There will be multiple records for each 'rec_id' one for each day present in the field 'datetime')
STEP 2: Use the above obtained list of 'rec_ids' to obtain the latest record based on 'datetime' field'
STEP 3: Apply the filters applied in STEP 1 to filter out if any incorrect data in the latest set of records.

Any input / suggestions / queries are welcome here.

polyfractal · May 14, 2018, 2:33pm

Gotcha... should be doable. Would it be possible to share a few sample documents and the criteria used to filter? I think it'd be easiest if I were to just write up a quick demonstration query, rather than trying to explain the steps.

nishantkumar · May 14, 2018, 2:45pm

The above shared transform was a sample of it. There are a set of conditions that I need to add in global filter and the same needs to be applied once I get the latest record. Does that help ?

system · June 11, 2018, 2:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Match_query inside script? Elasticsearch	1	359	July 5, 2017
Filter be scripted field Elasticsearch	3	589	August 4, 2018
Filters Bucket Aggregation using script fields and custom params Elasticsearch	1	640	December 29, 2017
Aggregation with script Elasticsearch	5	296	August 18, 2022
Filter aggregation buckets by top hits scripted field Elasticsearch	1	578	July 5, 2021

Filter based on scripted field in an aggregation

Related topics