Hi,
We are using ElasticSearch 5.3.0 and noticed a dramatic difference in performance between groovy and painless.
The index has 5 shards and about 30M documents, with a simple mapping (about 15 fields, all strings are keyword).
The following query takes over 70 seconds to complete - while it is running I see spikes in both CPU utilization and young GC.
POST test-idx/_search
{
"from" : 0,
"size" : 1000,
"query" : {
"match_all" : { }
},
"aggregations" : {
"Hostname" : {
"terms" : {
"script" : {
"inline" : "(_source.Hostname == null) ? null : _source.Hostname",
"lang" : "groovy"
},
"missing" : "NULL_STRING_TAG",
"size" : 2147483647
}
}
}
}
When switching to painless, the query takes ~4 seconds and none of the aforementioned effects are observed:
POST test-idx/_search
{
"from" : 0,
"size" : 1000,
"query" : {
"match_all" : { }
},
"aggregations" : {
"Hostname" : {
"terms" : {
"script" : {
"inline" : "doc['Hostname'].value == null ? null : doc['Hostname'].value",
"lang" : "painless"
},
"missing" : "NULL_STRING_TAG",
"size" : 2147483647
}
}
}
}
Is there an explanation for this drastic difference in performance? Is there some way we can optimize the groovy script to make it run faster (although it seems very simple to me)?