2.x shingle aggregation feature is lost in 5.x?


(Ray33) #1

In ES2.x, a string field with shingle analyzer (see below mapping) was able to produce a nested term aggregation, where the nested aggregation produced the most common phrases (due to shingle analyzer).
For example, for data:
Country, data
US, donald trump
US, donald duck
US, donald duck
US, donald trump
US, donald trump

The query would be able to retrieve that donald trump is the most searched term in US and donald duck is the second .

In ES5 where a field can be either text or keyword, it limits the query options to be either:

  • keyword field - Top hits word - a single word aggregation, which means that on the above example it will show only donald as the most search word.
  • text field (where it enables single analyzer) - it doesn't allow aggregation.

Any suggestion how to achieve a top searches aggregation for phrases in ES5 ?

This is the sample of shingle definition in 2.x :

{
"mappings": {
"searches": {
"properties": {
"q": {
"type": "string",
"analyzer": "shingle_analyzer"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2
}
},
"analyzer": {
"shingle_analyzer": {
"filter": ["lowercase",
"shingle_filter"],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}

The nested aggregation would be similar to this:

{
"query" : {
"match_all" : { }
},
"aggregations" : {
"countries" : {
"terms" : {
"field" : "country"
},
"aggregations" : {
"top_searches" : {
"terms" : {
"field" : "q",
"size" : 3,
"min_doc_count" : 2,
"include" : ".{4,35}"
}
}
}
}
}
}


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.