Script Query - Returning multiple values on aggregation

Aggregation Result

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "ALLOGENE THERAPEUTICS",
               "doc_count": 43
            },
            {
               "key": "CELLECTIS",
               "doc_count": 5
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

Merge Aggregation Result

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "STUB",   // ALLOGENE THERAPEUTICS + CELLECTIS
               "doc_count": 48
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

I want to merge "ALLOGENE THERAPEUTICS" and "CELLECTIS"

And change key name to "STUB"

Therefore, I made the following query using script.

The script query language is groovy.

my_field, which is the aggregation target, is a list type.

{
  "size" : 0,
  "query" : {
    "ids" : {
      "types" : [ ],
      "values" : [ "id1", "id2", "id3", "id4" ... ]
    }
  },
  "aggregations" : {
    "649" : {
      "terms" : {
        "script" : {
          "inline" : 
              "def param = new groovy.json.JsonSlurper().parseText(
                  '{\"ALLOGENE THERAPEUTICS\": \"STUB\", \"CELLECTIS\": \"STUB\"}'
              ); 
              def data = doc['my_field'].values; 
              def list = [];
              if (!doc['my_field'].empty) {   // my_field is list type
                  for (x in data) { 
                      if (param[x] != null) { 
                          list.add(param[x]); 
                      } 
                  } 
              }; 
              if (list.isEmpty()) { 
                  return data;  // PFIZER
              } else { 
                  return list;  // list["STUB", "STUB"]
              }"
        },
        "size" : 50
      }
    }
  }
}

According to the results, 48 STUB should be printed, but 47 STUB are being printed.

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "STUB",   // ALLOGENE THERAPEUTICS + CELLECTIS
               "doc_count": 47  // It has to be 48 !!
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

I've tried many things, but I think there's probably a problem with the list type.

I don't think I'm bringing all the elements.

I'd appreciate it if you could give me your opinion.



++ Additional

          def param = new groovy.json.JsonSlurper().parseText(
              '{\"ALLOGENE THERAPEUTICS\": \"STUB\", \"CELLECTIS\": \"STUB\"}'
          ); 
          def data = doc['my_field'].values; 
          def list = [];
          if (!doc['my_field'].empty) {   // my_field is list type
              for (x in data) { 
                  if (param[x] != null) { 
                      list.add(param[x]); 
                  } else {
                      list.add(x);
                  }
              } 
          }; 
          return list;

It is the same even if I try with the corresponding script query.

Are there any documents with both A.. and C..?
As it is "doc_count", it does not double count documents with two STUBs. Can you identify the dropped document and share it?

If you want to just add the two values, sum bucket aggregation might be help.

This is the result of aggregation to top_hits. It can be identified by documentId. How can I count in duplicate?

As you said, there is a document in which "A.." and "C.." overlap. That's why there's one missing. How can we include each of them in the "count"?

The duplicated document "_id" is "us011072644b2".

I suppose you need alternative methods.

or sum aggregation for runtime field with scrpt counting values may work.

@Tomo_M Can you tell me how to use sum aggregation?
In general, sum aggregation is used to calculate values of numeric types. How can I combine doc_count with sum aggregation?

Alternative 1:

GET /test_sum_bucket_agg/_search
{
  "size":0,
  "aggs":{
    "f":{
      "filters": {
        "filters": {
          "all": {"match_all":{}}
        }
      },
      "aggs":{
        "sename":{
          "terms":{
            "field":"currentApplicantInfo.seName"
          }
        },
        "STUB":{
          "bucket_script": {
            "buckets_path": {
              "countA": "sename['ALLOGENE THERAPEUTICS']>_count",
              "countC": "sename['CELLECTIS']>_count"
            },
            "script": "params.countA + params.countC",
            "format": "#"
          }
        }
      }
    }
  }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "f" : {
      "buckets" : {
        "all" : {
          "doc_count" : 3,
          "sename" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "ALLOGENE THERAPEUTICS",
                "doc_count" : 3
              },
              {
                "key" : "CELLECTIS",
                "doc_count" : 1
              },
              {
                "key" : "PFIZER",
                "doc_count" : 1
              }
            ]
          },
          "STUB" : {
            "value" : 4.0,
            "value_as_string" : "4"
          }
        }
      }
    }
  }
}

Alternative 2:

GET /test_sum_bucket_agg/_search
{
  "runtime_mappings": {
    "countAC": {
      "type": "long",
      "script":{
        "lang": "painless",
        "source": "int x=0; for (sename in doc['currentApplicantInfo.seName']){if ((sename == 'ALLOGENE THERAPEUTICS') || (sename == 'CELLECTIS')){x = x+1}} emit(x)"
      }
    }
  },
  "size":0,
  "fields": [
    {"field": "countAC"}
  ],
  "aggs":{
    "STUB":{
      "sum":{
        "field": "countAC",
        "format": "#"
      }
    },
    "terms":{
      "terms":{
        "field":"currentApplicantInfo.seName"
      }
    }
  }
}
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "ALLOGENE THERAPEUTICS",
          "doc_count" : 3
        },
        {
          "key" : "CELLECTIS",
          "doc_count" : 1
        },
        {
          "key" : "PFIZER",
          "doc_count" : 1
        }
      ]
    },
    "STUB" : {
      "value" : 4.0,
      "value_as_string" : "4"
    }
  }
}

I also recommend to do it on the client side as he said on another topic.