i have query dsl Elasticsearch with field data 'arr_color' is an string "red_blu_black" and an agg script like this
{
"runtime_mappings": {},
"size": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "(arr_color:(*bl*))"
}
}
}
},
"aggs": {
"field_occurences": {
"terms": {
"size": 10000,
"script": {
"source": """def array_list = Arrays.asList(doc['arr_color.keyword'].value.splitOnToken('_'));
List result = [];
for ( i in array_list){if(i.contains('bl')){result.add(i);}}
return result;"""
}
},
"aggs": {
"paging": {
"bucket_sort": {
"from": 0,
"size": 10
}
},
"is_all_deleted": {
"min": {
"field": "is_deleted"
}
}
}
}
}
}
the result will return is all element after split field arr_color
like [blu, black, blue,...]
but it's seem have to high caculate if the field have many color and have too much doc.
How can i optimize this code, and can i mapping this agg to ES so next time if i want to use this agg, i just need to call
"agg":{
"token_string":{
"field": "arr_color"
}
}
and will mapping agg can help me run query faster than post script everytime i query? Thanks