How to filter and narrow the data before aggregation

Hello, all

I have a query which group by groupId.

{
   "aggs":{
      "result":{
          "terms":{
              "field":"groupId",
              "size": 100
         }
     }
  },
  "size": 0
}

Because the index contains more than 100 million documents, and groupId is a high-cardinality field, it is very slow.
But I added a filter in the query, like:

{
   "aggs":{
      "result":{
          "terms":{
              "field":"groupId",
              "size": 100
         }
     }
  },
  "query":{
      "terms":{
         "groupId":["aaaaaaaaa","bbbbbbbbb","ccccccccc"]
     }
  },
  "size": 0
}

or

{
   "aggs":{
     "filter_by_group":{
         "filter":{
              "terms":{
                 "groupId":["aaaaaaaaa","bbbbbbbbb","ccccccccc"]
              }
         },
        "aggs":{
            "result":{
                "terms":{
                    "field":"groupId",
                    "size": 100
               }
           }
      }
  },
  "size": 0
}

It seems to have no effect.

  1. How to achieve by query DSL?
  2. How to define the data scope of terms aggregation?
    Please give some suggestions to improve performance.

Thanks.

What does the data in the groupId field look like? What is the mapping for this field?

The doc looks like this:

{
          "id" : "c07d9542a307e27b",
          "timestamp" : 1608187994071,
          "service" : "monitor",
          "system": "APM",
          "groupId": "8d5b716ebac05c4c",
          "parentId": "7681adf0a845d277",
          ......
}

The mapping is:

{
      "properties" : {
          "dynamic" : "true",
            "id" : {
              "type" : "keyword"
            },
            "service" : {
              "type" : "keyword",
              "normalizer" : "lowercase_normalizer"
            },
            "system" : {
              "type" : "keyword",
              "normalizer" : "lowercase_normalizer"
            },
            "groupId" : {
              "type" : "keyword"
            },
            "parentId" : {
              "type" : "keyword"
            },
            "timestamp" : {
              "type" : "date",
              "format" : "epoch_millis"
            }
       }
}

I want to achieve a similar function select distinct groupId where condition

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.