Filter buckets after top_hits aggregation

ajonat · May 11, 2020, 2:55am

Hi,

I have an index with documents that only have 3 fields: id, timestamp and status.

I want to retrieve the newest document for each id, but: if that document's status is equals to "something", then I would like to ignore that bucket completely.

What's the best way to do this?

I was able to retrieve the newest document for each id with the query below but I don't know how to filter buckets based on the document's status.

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "group_by_order_id": {
      "terms": {
        "field": "id.keyword"
      },
      "aggs": {
        "top_group_hits": {
          "top_hits": {
            "sort": [
              {
                "timestamp": {
                  "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
    }
  }
}

Thanks,
Alex.

ajonat · May 12, 2020, 7:04pm

Bump

Mark_Harwood · May 13, 2020, 3:25pm

Use a query other than "match_all". You can use a bool query with a must_not expression and put a "match" expression ion there for the status you want to ignore.

ajonat · May 14, 2020, 12:10am

Hi Mark,

I don't think that is going to work.

Suppose I have 2 events:

id: 1, status: A, timestamp: 2020-05-05
id: 1, status: B, timestamp: 2020-04-04

If I filter out status A in the query, I'd get a bucket with the second event (status B).
What I really want is to filter out the bucket if the newest document (in this example, the first event) contains status A. So, the result of my query would be 0 buckets or a single empty bucket (with no hits).

Mark_Harwood · May 14, 2020, 7:32am

Ah gotcha. Sounds like a “last known status” problem for which an entity centric index is best. For that see the transform api.

ajonat · May 14, 2020, 7:56pm

Thank you for the tip Mark ! Will look into that API.

CSharpBender · May 14, 2020, 8:23pm

Try with bucket selector https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-aggregations-pipeline-bucket-selector-aggregation.html

Mark_Harwood · May 15, 2020, 7:36am

One of the issues with trimming via bucket selectors is that after trimming you may find you have no buckets left at all and have to go back and ask for more data with more searches. It can be a workable solution but depends on the data and the worst case scenario is very inefficient.

ajonat · May 24, 2020, 12:35am

Hi Mark,

I tried to use the transform API but it doesn't seem to support top_hits aggregation.
https://www.elastic.co/guide/en/elasticsearch/reference/current/put-transform.html

How would you do it?

Hendrik_Muhs · May 25, 2020, 5:51am

Hi,

please have a look at this painless example (This should also work in older versions).

For filtering out complete buckets/documents I suggest to use a drop processor that runs after the pivot. This can be done with an ingest pipeline, which you can specify as part of the transform destination.

ajonat · May 25, 2020, 6:30am

Hi Hendrik,

I followed the painless example and I was able to transform the index into a "last document" index grouped by id.

As my source index is prefixed by the date the doc was put (sample indexes: docs-2020-05-25, docs-2020-04-24, ...), can I do the same for the index generated by the transform API?
I would like to generate indexes like: latest-doc-by-id-2020-05-25, latest-doc-by-id-2020-05-24, ...

Thanks,
Alex

Hendrik_Muhs · May 25, 2020, 9:12am

Great that you found a solution, regarding your follow up question:

I am thinking of ingest again and using the set processor to set _index: https://www.elastic.co/guide/en/elasticsearch/reference/current/accessing-data-in-pipelines.html#accessing-metadata-fields.

The name for this index could be based on a field you create with transform, e.g. as part of the script.

system · June 22, 2020, 9:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter out buckets in an aggregated query Elasticsearch	3	1243	July 6, 2017
Filter aggregation buckets by top hits scripted field Elasticsearch	1	578	July 5, 2021
What should the bucket path be with a top hits Elasticsearch	1	237	April 28, 2023
Find servers whose last logged event was "error" Elasticsearch	8	855	July 5, 2017
ElasticSearch - Filter Buckets Elasticsearch	2	315	January 26, 2021

Filter buckets after top_hits aggregation

Related topics