Script Query - Returning multiple values on aggregation

hyun · January 12, 2022, 12:53am

Aggregation Result

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "ALLOGENE THERAPEUTICS",
               "doc_count": 43
            },
            {
               "key": "CELLECTIS",
               "doc_count": 5
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

Merge Aggregation Result

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "STUB",   // ALLOGENE THERAPEUTICS + CELLECTIS
               "doc_count": 48
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

I want to merge "ALLOGENE THERAPEUTICS" and "CELLECTIS"

And change key name to "STUB"

Therefore, I made the following query using script.

The script query language is groovy.

my_field, which is the aggregation target, is a list type.

{
  "size" : 0,
  "query" : {
    "ids" : {
      "types" : [ ],
      "values" : [ "id1", "id2", "id3", "id4" ... ]
    }
  },
  "aggregations" : {
    "649" : {
      "terms" : {
        "script" : {
          "inline" : 
              "def param = new groovy.json.JsonSlurper().parseText(
                  '{\"ALLOGENE THERAPEUTICS\": \"STUB\", \"CELLECTIS\": \"STUB\"}'
              ); 
              def data = doc['my_field'].values; 
              def list = [];
              if (!doc['my_field'].empty) {   // my_field is list type
                  for (x in data) { 
                      if (param[x] != null) { 
                          list.add(param[x]); 
                      } 
                  } 
              }; 
              if (list.isEmpty()) { 
                  return data;  // PFIZER
              } else { 
                  return list;  // list["STUB", "STUB"]
              }"
        },
        "size" : 50
      }
    }
  }
}

According to the results, 48 STUB should be printed, but 47 STUB are being printed.

"aggregations": {
      "649": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "STUB",   // ALLOGENE THERAPEUTICS + CELLECTIS
               "doc_count": 47  // It has to be 48 !!
            },
            {
               "key": "PFIZER",
               "doc_count": 4
            }
         ]
      }
   }

I've tried many things, but I think there's probably a problem with the list type.

I don't think I'm bringing all the elements.

I'd appreciate it if you could give me your opinion.

++ Additional

          def param = new groovy.json.JsonSlurper().parseText(
              '{\"ALLOGENE THERAPEUTICS\": \"STUB\", \"CELLECTIS\": \"STUB\"}'
          ); 
          def data = doc['my_field'].values; 
          def list = [];
          if (!doc['my_field'].empty) {   // my_field is list type
              for (x in data) { 
                  if (param[x] != null) { 
                      list.add(param[x]); 
                  } else {
                      list.add(x);
                  }
              } 
          }; 
          return list;

It is the same even if I try with the corresponding script query.

Tomo_M · January 12, 2022, 5:41am

Are there any documents with both A.. and C..?
As it is "doc_count", it does not double count documents with two STUBs. Can you identify the dropped document and share it?

If you want to just add the two values, sum bucket aggregation might be help.

hyun · January 12, 2022, 5:50am

This is the result of aggregation to top_hits. It can be identified by documentId. How can I count in duplicate?

hyun · January 12, 2022, 5:53am

As you said, there is a document in which "A.." and "C.." overlap. That's why there's one missing. How can we include each of them in the "count"?

hyun · January 12, 2022, 5:55am

The duplicated document "_id" is "us011072644b2".

Tomo_M · January 12, 2022, 6:40am

I suppose you need alternative methods.

or sum aggregation for runtime field with scrpt counting values may work.

hyun · January 12, 2022, 7:15am

@Tomo_M Can you tell me how to use sum aggregation?
In general, sum aggregation is used to calculate values of numeric types. How can I combine doc_count with sum aggregation?

Tomo_M · January 13, 2022, 1:19pm

Alternative 1:

GET /test_sum_bucket_agg/_search
{
  "size":0,
  "aggs":{
    "f":{
      "filters": {
        "filters": {
          "all": {"match_all":{}}
        }
      },
      "aggs":{
        "sename":{
          "terms":{
            "field":"currentApplicantInfo.seName"
          }
        },
        "STUB":{
          "bucket_script": {
            "buckets_path": {
              "countA": "sename['ALLOGENE THERAPEUTICS']>_count",
              "countC": "sename['CELLECTIS']>_count"
            },
            "script": "params.countA + params.countC",
            "format": "#"
          }
        }
      }
    }
  }
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "f" : {
      "buckets" : {
        "all" : {
          "doc_count" : 3,
          "sename" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "ALLOGENE THERAPEUTICS",
                "doc_count" : 3
              },
              {
                "key" : "CELLECTIS",
                "doc_count" : 1
              },
              {
                "key" : "PFIZER",
                "doc_count" : 1
              }
            ]
          },
          "STUB" : {
            "value" : 4.0,
            "value_as_string" : "4"
          }
        }
      }
    }
  }
}

Alternative 2:

GET /test_sum_bucket_agg/_search
{
  "runtime_mappings": {
    "countAC": {
      "type": "long",
      "script":{
        "lang": "painless",
        "source": "int x=0; for (sename in doc['currentApplicantInfo.seName']){if ((sename == 'ALLOGENE THERAPEUTICS') || (sename == 'CELLECTIS')){x = x+1}} emit(x)"
      }
    }
  },
  "size":0,
  "fields": [
    {"field": "countAC"}
  ],
  "aggs":{
    "STUB":{
      "sum":{
        "field": "countAC",
        "format": "#"
      }
    },
    "terms":{
      "terms":{
        "field":"currentApplicantInfo.seName"
      }
    }
  }
}

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "ALLOGENE THERAPEUTICS",
          "doc_count" : 3
        },
        {
          "key" : "CELLECTIS",
          "doc_count" : 1
        },
        {
          "key" : "PFIZER",
          "doc_count" : 1
        }
      ]
    },
    "STUB" : {
      "value" : 4.0,
      "value_as_string" : "4"
    }
  }
}

I also recommend to do it on the client side as he said on another topic.

system · February 10, 2022, 1:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Script Query - Returning multiple values on aggregation Elasticsearch	1	355	February 4, 2022
Painless terms script aggregations Elasticsearch	3	3342	November 10, 2017
Return many values from Painless script to aggregation Elasticsearch	2	1370	December 15, 2017
Script to return array for scripted metric aggregation from combine Elasticsearch	3	3132	July 6, 2017
Scripting aggregation on Object Elasticsearch aggregations	4	148	April 8, 2024

Script Query - Returning multiple values on aggregation

Related topics