Manipulating scope in an elasticsearch aggregation?

Hello,

Running into something with the implementation of a multi-select check box and aggregations. The use
case is:

  • User makes a text-based query
  • UX has a list of categories, of which the user may select one or more to filter the results of subsequent queries.
  • Ideally, the categories displayed would be all categories matching the text-based query, but not constrained by which values have been selected for filtering.

As I understand it, the aggregation bucket domain is either the search context, or it is the global context, potentially with additional filters applied in either case. Is there a corresponding aggregation to say that it is the search context, but without certain filters applied?

More concretely, say we have a food index, with fruit: apples, bananas, plantains, and grapes. and dairy: cheese, milk, yogurt.

I'd like an aggregation that has counts for fruit:apples, bananas, plantains, and grapes, while allowing users to browse just the apples and bananas.

{
	"query": {
		"bool": {
			"must": {
				"term": {
					"food_group": "fruit"
				}
			},
			"filter": {
				"terms": {
					"food_name": ["apple", "banana"]
				}
			}
		},
		"aggs": {
			"kinds_o_fruit": {
				"terms": {
					"field": "food_name"
				}
			}
		}
	}
}

As-is, "kinds_o_fruit" will return all foods with food_group: fruit, and food_name apple or banana. If food_name is a multifield, then there might be additional values - apples, bananas, and plantains would make sense here.

As I understand it, the only way to get an aggregation that has "kinds_o_fruit" with all kinds of fruit would be to nest it in a global aggregation with the original match query as a filter:

{
	"query": {
		"bool": {
			"must": {
				"term": {
					"food_group": "fruit"
				}
			},
			"filter": {
				"terms": {
					"food_name": ["apple", "banana"]
				}
			}
		},
		"aggs": {
			"all_food": {
				"global": {},
				"aggs": {
					"kinds_o_fruit": {
						"filter": {
							"term": {
								"food_group": "fruit"
							}
						},
						"aggs": {
							"title_filtered": {
								"terms": {
									"field": "title.raw"
								}
							}
						}
					}
				}
			}
		}
	}
}

As long as the initial text query is a simple "terms" query, this is not unreasonable. But if the initial query is something more elaborate, then the performance and maintenance complexity could quickly get out of hand.

Is there a way perhaps with named queries to specify the context as documents associated with the initial "must" clause? Ideally, I'd like documents in the query domain, but without named filters applied.

Still working through the examples, but I think this is a case where the post_filter will help.

The post_filter is a filtering operation that is performed after the aggregation has been computed. So that would allow your aggregation to show apples, bananas, plantains, grapes (aggregating on all the fruit category basically), then a post_filter is applied which further restricts the search hits down to apples, bananas. The post_filter is only applied to search hits, not aggregations, so it can be used as a final "phase" for filtering what the user sees without touching the aggs.

I think that will help achieve what you're looking for, and keep the query/aggs reasonably simple. Lemme know if that helps! :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.