Filtering Aggregations

(Graham Roberts) #1


First post here, so hello, nice to meet you all.

I am playing with aggregations and getting some great results but I have one particular use case where I don't have an optimal result yet and I was hoping for a steer.

Each document has a locality which is an array structured like this to support hierarchical searching:
0/United Kingdom,
1/United Kingdom/Berkshire
2/United Kingdom/Berkshire/Reading

The numeric prefix is a depth indicator that is used when filtering.

I am creating an auto suggest of available localities, when some types in "rea" I only want to return this result in the aggregation bucket:
2/United Kingdom/Berkshire/Reading

However, in order make the autocomplete suggestions work I am using a pattern tokenizer so I can search only the lowest fragment in the hierarcy:
'tokenizer' => array(
'descendent' => array(
'type' => 'pattern',
'pattern' => '([^/]+$)', // Get everything after the last slash
'group' => 1

I am using a edge n-gram filter so I can do a "starts with" like search:
'edge_ngram_filter' => array(
'type' => 'edge_ngram',
'min_gram' => 2,
'max_gram' => 20,

And then I put that together in my analyzer:
'edge_ngram_analyzer_descendent' => array(
'type' => 'custom',
'tokenizer' => 'descendent',
'filter' => array(

But the difficulty is that I need to return a value from a different field in my aggregation, the untouched version. If I setup my aggregation like this:

"size": 0,
"query": {
"filtered": {
"query": {
"match": {
"field_item_locality.edge_ngram_analyzer_descendent": "rea"
"aggs": {
"field_item_locality": {
"terms": {
"field": "field_item_locality.untouched",
"size": 0

Then I will receive all items from the locality array in the document, not just the one that matches.

And I can't use an include on the aggregation like:
"include": "rea.*"
because if I do then I have to do this on one of my analyzed fields to account for the lowercasing and extracting the lowest descendent.

What direction should I be looking in to solve this?

Many thanks in advance


(system) #2