Terms aggregation on named queries

I was hoping to be able to do a terms aggregation on the 'matched_queries' field that is generated when doing a named query. Since the matched_queries is added outside the document it is not accessible as a field for a terms aggregation at this time. I am curious if this could be a feature request or is it just not possible due to how aggregations are built.

I had a similar disappointment when I discovered I couldn't use these names in terms aggs.

Unfortunately this information is only derived at the fetch phase for individual docs not inline in the collect phase when aggs run.

It is likely to require changes to core Lucene. My view was that Lucene Query clauses currently gather only a score for each stream of matching docs. Maybe, like the Lucene tokenization API [1] , additional metadata could optionally be emitted via a search equivalent of TokenStream Attribute objects.

This would allow each doc to have arbitrary "match metadata" attached as is if they were properties of the document. A name/tag is an example of one piece of metadata e.g. your Boolean OR query that looks for terms elasticsearch, logstash or kibana could associate the user-supplied tag elasticstack for use in aggs. As well as specifically tagging a user-defined category like this you could attach a numeric measure of belonging to a category e.g. music-listener profiles could be ranked on their "death-metal-ness" or "jazz-ness" by supplying lists of bands in these genres and returning the number of band names a user matched in each query clause. These numbers provide a level of "about-ness" which could be plotted in a histogram agg etc.

Some of this is achievable today if you mess around with boosts, constant_score and scripted aggs to smuggle metadata out in Lucene's single float score but it is a less than ideal way of getting at details behind the Lucene matching logic.

