I have a bunch of elastic search documents that contain information about jobs ads. I'm trying to aggregate the **title**
field to extract the number of "experience" instances from the job posting. e.g. Junior, Senior, Lead, etc. Instead what I'm getting are buckets that match the title as a whole instead of the each word it the title field. e.g. "Junior Java Developer", "Senior .NET Analyst", etc.
How can I tell elastic search to split the aggregation based on each word in the title as opposed the matching the value of the whole field.
I would later like to expand the query to also extract the "skill level" and "role", but it should also be fine if the buckets contain all the words in the field as long as they are split into separate buckets.
I do not want to enable "fielddata = true" since it consumes a lot of memory and is already established as a bad practice. How else can I implement it?
Current query:
GET /jobs/_search
{
"query": {
"match_all": {}
},
"aggs": {
"group_by_state": {
"terms": {
"field": "title"
}
}
}
}
Unwanted Output:
{
...
"hits": {
"total": 63,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 14,
"buckets": [{
"key": "Junior Java Tester",
"doc_count": 6
},{
"key": "Senior Java Lead",
"doc_count": 6
},{
"key": "Intern Java Tester",
"doc_count": 5
},
...
]
}
}
}
Desired Output:
{
...
"hits": {
"total": 63,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 14,
"buckets": [{
"key": "Junior",
"doc_count": 12
},{
"key": "Senior",
"doc_count": 8
},{
"key": "Tester",
"doc_count": 5
},{
"key": "Intern",
"doc_count": 5
},{
"key": "Analyst",
"doc_count": 5
},
...
]
}
}
}