How to filter and narrow the data before aggregation

jun_miao · December 19, 2020, 5:03am

Hello, all

I have a query which group by groupId.

{
   "aggs":{
      "result":{
          "terms":{
              "field":"groupId",
              "size": 100
         }
     }
  },
  "size": 0
}

Because the index contains more than 100 million documents, and groupId is a high-cardinality field, it is very slow.
But I added a filter in the query, like:

{
   "aggs":{
      "result":{
          "terms":{
              "field":"groupId",
              "size": 100
         }
     }
  },
  "query":{
      "terms":{
         "groupId":["aaaaaaaaa","bbbbbbbbb","ccccccccc"]
     }
  },
  "size": 0
}

or

{
   "aggs":{
     "filter_by_group":{
         "filter":{
              "terms":{
                 "groupId":["aaaaaaaaa","bbbbbbbbb","ccccccccc"]
              }
         },
        "aggs":{
            "result":{
                "terms":{
                    "field":"groupId",
                    "size": 100
               }
           }
      }
  },
  "size": 0
}

It seems to have no effect.

How to achieve by query DSL？
How to define the data scope of terms aggregation？
Please give some suggestions to improve performance.

Thanks.

Christian_Dahlqvist · December 19, 2020, 8:10am

What does the data in the groupId field look like? What is the mapping for this field?

jun_miao · December 21, 2020, 1:44am

The doc looks like this:

{
          "id" : "c07d9542a307e27b",
          "timestamp" : 1608187994071,
          "service" : "monitor",
          "system": "APM",
          "groupId": "8d5b716ebac05c4c",
          "parentId": "7681adf0a845d277",
          ......
}

The mapping is:

{
      "properties" : {
          "dynamic" : "true",
            "id" : {
              "type" : "keyword"
            },
            "service" : {
              "type" : "keyword",
              "normalizer" : "lowercase_normalizer"
            },
            "system" : {
              "type" : "keyword",
              "normalizer" : "lowercase_normalizer"
            },
            "groupId" : {
              "type" : "keyword"
            },
            "parentId" : {
              "type" : "keyword"
            },
            "timestamp" : {
              "type" : "date",
              "format" : "epoch_millis"
            }
       }
}

I want to achieve a similar function select distinct groupId where condition

system · January 18, 2021, 1:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to speed up aggregated query? Elasticsearch	5	419	January 16, 2020
Writing aggregate with filtering Elasticsearch	5	5003	October 30, 2019
Aggregation Query filtering on results Elasticsearch	9	300	August 11, 2023
ElasticSearch - Filter Buckets Elasticsearch	2	320	January 26, 2021
Query and return aggs group by terms Elasticsearch	6	390	July 13, 2020

How to filter and narrow the data before aggregation

Related topics