I was hoping to be able to do a terms aggregation on the 'matched_queries' field that is generated when doing a named query. Since the matched_queries is added outside the document it is not accessible as a field for a terms aggregation at this time. I am curious if this could be a feature request or is it just not possible due to how aggregations are built.
I had a similar disappointment when I discovered I couldn't use these names in terms aggs.
Unfortunately this information is only derived at the fetch phase for individual docs not inline in the collect phase when aggs run.
I am curious if this could be a feature request
It is likely to require changes to core Lucene. My view was that Lucene Query clauses currently gather only a score for each stream of matching docs. Maybe, like the Lucene tokenization API [1] , additional metadata could optionally be emitted via a search equivalent of TokenStream Attribute objects.
This would allow each doc to have arbitrary "match metadata" attached as is if they were properties of the document. A name/tag is an example of one piece of metadata e.g. your Boolean OR query that looks for terms elasticsearch
, logstash
or kibana
could associate the user-supplied tag elasticstack
for use in aggs. As well as specifically tagging a user-defined category like this you could attach a numeric measure of belonging to a category e.g. music-listener profiles could be ranked on their "death-metal-ness" or "jazz-ness" by supplying lists of bands in these genres and returning the number of band names a user matched in each query clause. These numbers provide a level of "about-ness" which could be plotted in a histogram agg etc.
Some of this is achievable today if you mess around with boosts, constant_score and scripted aggs to smuggle metadata out in Lucene's single float score but it is a less than ideal way of getting at details behind the Lucene matching logic.