Terms aggregation on named queries

(Jonsgreen) #1

I was hoping to be able to do a terms aggregation on the 'matched_queries' field that is generated when doing a named query. Since the matched_queries is added outside the document it is not accessible as a field for a terms aggregation at this time. I am curious if this could be a feature request or is it just not possible due to how aggregations are built.

(Mark Harwood) #2

I had a similar disappointment when I discovered I couldn't use these names in terms aggs.

Unfortunately this information is only derived at the fetch phase for individual docs not inline in the collect phase when aggs run.

I am curious if this could be a feature request

It is likely to require changes to core Lucene. My view was that Lucene Query clauses currently gather only a score for each stream of matching docs. Maybe, like the Lucene tokenization API [1] , additional metadata could optionally be emitted via a search equivalent of TokenStream Attribute objects.

This would allow each doc to have arbitrary "match metadata" attached as is if they were properties of the document. A name/tag is an example of one piece of metadata e.g. your Boolean OR query that looks for terms elasticsearch, logstash or kibana could associate the user-supplied tag elasticstack for use in aggs. As well as specifically tagging a user-defined category like this you could attach a numeric measure of belonging to a category e.g. music-listener profiles could be ranked on their "death-metal-ness" or "jazz-ness" by supplying lists of bands in these genres and returning the number of band names a user matched in each query clause. These numbers provide a level of "about-ness" which could be plotted in a histogram agg etc.

Some of this is achievable today if you mess around with boosts, constant_score and scripted aggs to smuggle metadata out in Lucene's single float score but it is a less than ideal way of getting at details behind the Lucene matching logic.

[1] https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/analysis/TokenStream.html

(system) #3