You can use a scripted terms aggregation for this.
Given your data has been indexed like this:
POST _bulk
{"index" : {"_index": "my_index", "_type": "doc", "_id": "1"}}
{"category": "BUG", "user": "Peter", "description": "It's a Windows issue bla bla", "date": "2017-01-15"}
{"index" : {"_index": "my_index", "_type": "doc", "_id": "2"}}
{"category": "BUG", "user": "Peter", "description": "It's a Linux and Windows combined issue bla bla", "date": "2017-01-16"}
{"index" : {"_index": "my_index", "_type": "doc", "_id": "3"}}
{"category": "BUG", "user": "Peter", "description": "It's a Linux issue bla bla", "date": "2017-01-17"}
The following aggregation request:
GET my_index/_search
{
"size": 0,
"aggs": {
"Member": {
"terms": {
"script": {
"source": """
if (doc['description.keyword'].value =~ /.*Linux/) {
return "Type A";
}
else {
return "Type B"
}
""",
"lang": "painless"
},
"size": 10
}
}
}
}
Returns you this:
"Member": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Type A",
"doc_count": 2
},
{
"key": "Type B",
"doc_count": 1
}
]
}
In order to use regular expressions in scripts, you will need to enable that in the elasticsearch.yml
configuration file first:
script.painless.regex.enabled: true