Script aggreation shows significant performance issue then none-script aggreation

Hi All,

I currently meet siginificant performance issue when using script aggreation. Here is my test script which using to do some mapping work:

GET _search
{
  "size": 0,
  "aggs": {
    "id_host_status": {
      "terms": {
        "size": 0,
        "script": "if (doc['HOST_STATUS'].value == 'Closed_Adm' || doc['HOST_STATUS'].value == 'Closed_Full' ||  doc['HOST_STATUS'].value == 'Closed_LIM' ) { return 'Close'} else if(doc['HOST_STATUS'].value == 'Unavailable' || doc['HOST_STATUS'].value == 'Unavail'){return 'Unavail'} else { return doc['HOST_STATUS'].value} "
      }
    }
  }
}

It took 17+ seconds to go through around 9M records

{
  "took": 17608,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "hits": {
    "total": 8834017,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "id_host_status": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "OK",
          "doc_count": 8299098
        },
        {
          "key": "Unavail",
          "doc_count": 381977
        },
        {
          "key": "busy",
          "doc_count": 150535
        },
        {
          "key": "-OK",
          "doc_count": 2403
        },
        {
          "key": "-busy",
          "doc_count": 4
        }
      ]
    }
  }
}

which was much slowers then using simply terms aggreation ( 400+ ms in below case):

GET _search
{
  "size": 0,
  "aggs": {
    "id_host_status": {
      "terms": {
        "field": "HOST_STATUS"
      }
    }
  }
}

Did I using a wrong way? or any other better way avaliable?

Many thanks.

Jin

Performance wise it's indeed slower to use scripting in most cases. If this is a common aggregation you perform, I would suggest you alter the values in the pipline (Logstash) prior to indexing the documents. In other words, move that logic prior indexing instead (if possible), it will speed up aggregations significantly.

One way without using scripts is as follows:

GET lastfmusers2/_search
{
   "size": 0,
   "aggs": {
	  "GroupedTerms": {
		 "filters": {
			"filters": {
			   "Europe": {
				  "terms": {
					 "country.raw": [
						"France",
						"Germany",
						"Spain",
						...
					 ]
				  }
			   },
			   "Asia": {
				  "terms": {
					 "country.raw": [
						"China",
						"Japan",
						"India",
						...
					 ]
				  }
			   }               
			}
		 }
	  },
	  "OtherUngroupedTerms": {
		 "terms": {
			"field": "country.raw",
			"exclude": [
			   "France",
			   "Germany",
			   "Spain",
			   ...
			   "China",
			   "Japan",
			   "India",
			   ...
			]
		 }
	  }
   }
}

Joar,
Thanks for your suggestion. My case is the use might be modify the mapping online, so it's hard to do the mapping thing before indexing. :frowning:

Hi Mark,

It looks great. Is there any possiblity to move the "OtherUngroupedTerms" into "GroupedTerms"? As I need to do other sub-aggs after "GroupedTerms".

Otherwise, I need to double the logic and added it after "GroupedTerms" and "OtherUngroupedTerms" separately.

Thanks,
Jin

@Jin_Guo have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html#_literal_other_literal_bucket for using the other bucket in the filters aggregation