Filter Aggegation with a lot of aggs

I have a pretty specific usecase, which I have working at the moment, but I'm just wondering if there is a better, cleaner way to do it.

  • I have 10 aggs
  • For all aggs filters can and will be applied
  • Results should be filtered by all aggs filters
  • Each agg should be filtered by all aggs filters except their own

Websites like f.e. amazon work like this.

I have now fixed this by using post_filter for the results and Filter Aggegation for the aggregations filters.
However, when all 10 filters are applied, each aggregation will have 9 filters added to them. This seems a bit silly.

In solr there is some functionality for a facet to exclude one or more facets, which can be the facet itself, thus resolving above usecase.

Is there another way to fix usecase?

Thanks for any advice!

You can add instead a global aggregation I believe with a sub filter aggregation may be?

HTH

There's a trick you can apply here in the query:

bool
	must : [
		query-string : [user input],
		bool
			min_should_match: 2
			should: [
				  bool:
						  should: [
								 color: blue,
								 color:green
				  bool:
						  should: [
								 size: L
				  bool:
						  should: [
								 brand: X,
								 brand: Y
			 ]
  ]

Then just have your aggs tree unfiltered.
The min-should-match clause should be one less than the number of dimensions (in the above case size/colour/brand). This ensures matches that hit a dimension in the agg tree always satisfy all other category selections.

The down-side is that the hits will match all but one of the selected criteria but you can apply a post-filter to tighten it up to match all selections. This may be doubling the original criteria but better than the alternative of each of your 9 dimensions requiring a filter with the other 8.

Hi Mark,

Sorry I couldn't get around to this any sooner. It's still not entirely clear to me whats happening here? This is the regular query, but then with filters already applied? I prefer not to implent something I don't understand myself and can explain to my colleagues, since we need to expand on this later.

Is it possible (I'm really sorry for asking this) to create a complete example with regular search, aggregations and filters given the above parameters? I totally understand if this it to much of an effort.

Maybe I should ask another question: Is it bad to send as much filter aggregations as I will be doing as described in the OP? I'm not sure which impact a big search / filter query will have.

Hi!

Thanks for your reply! I'm not really sure how this would solve my usecase though, besides moving the post_filters so the regular filters by wrapping them in a global aggregation?
Or is there something I'm missing here?

Working example:

// Setup sample index and data. We have 3 faceting dimensions (colour/size/brand)
DELETE test
PUT test
{
  "settings": {
	"number_of_replicas": 0,
	"number_of_shards": 1
  },
  "mappings": {
	"doc":{
	  "properties": {
		"colour":{"type":"keyword"},
		"size":{"type":"keyword"},
		"brand":{"type":"keyword"},
		"description":{"type":"text"}
	  }
	}
  }
}
POST test/doc/1
{
  "size":"large",
  "colour":"red",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/2
{
  "size":"large",
  "colour":"blue",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/3
{
  "size":"small",
  "colour":"blue",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/4
{
  "size":"small",
  "colour":"blue",
  "brand":"brandB",
  "description":"tshirt"
}
POST test/doc/5
{
  "size":"large",
  "colour":"blue",
  "brand":"brandB",
  "description":"tshirt"
}




// Run a search where user has made selections (in this case in all 3 dimensions)


POST test/_search
{
  "query": {
	"bool": {
	  "must": [
		{
		  "match": {
			"description": "tshirt"
		  }
		}
	  ],
	  
	  // This is set to one-less than the number of dimensions with criteria set
	  "minimum_should_match": 2, 
	  
	  "should":[
		{
		  "terms": { "colour": ["red", "blue"]}
		},
		{
		  "terms": { "size": ["large"]}
		},
		{
		  "terms": { "brand": ["brandA"]}
		}
		]
	}
  },
  
  // This is required to tighten up the hits to match ALL user choices and is a copy
  // of the user selections in a must.
  "post_filter": {
	"bool":{
	  "must": [
		{
		  "terms": { "colour": ["red", "blue"]}
		},
		{
		  "terms": { "size": ["large"]}
		},
		{
		  "terms": { "brand": ["brandA"]}
		}        
		]
	}
  }, 
  "size": 10,
  
  // The aggs tree is simple - no filters required
  "aggs": {
	"size": {
	  "terms": {
		"field": "size"
	  }
	},
	"colour": {
	  "terms": {
		"field": "colour"
	  }
	},
	"brand": {
	  "terms": {
		"field": "brand"
	  }
	}
  }
}
1 Like

Hello Mark,

I'm wondering, what if I want to expand my query with some more "should" queries, to search and boost on specific fields?
Let's say i want add a should search on:
title: boost 10
description: boost 5
usps: boost 1

My search results "should" match 1 of the 3 here, won't this interfere with the filters "should" queries? I should increase the "minimum_should_match" with 1, because it only needs to match one, but then again, the number of matched queries for filters won't be correct anymore.

Or should I work with nested bool queries in the main query? One for filters and one for the regular search part?

Put all that in a child bool query under the "must" clause

{
"aggregations": {
	"categories": {
		"nested": {
			"path": "categories"
		},
		"aggregations": {
			"translated": {
				"terms": {
					"field": "categories.nl.raw",
					"min_doc_count": 0
				}
			}
		}
	},
	"target_audiences": {
		"nested": {
			"path": "target_audiences"
		},
		"aggregations": {
			"translated": {
				"terms": {
					"field": "target_audiences.nl.raw",
					"min_doc_count": 0
				}
			}
		}
	}
},
"query": {
	"bool": {
		"must": {
			"match_all": {}
		},
		"minimum_should_match": 0,
		"should": [{
			"nested": {
				"path": "target_audiences",
				"query": {
					"terms": {
						"target_audiences.nl.raw": ["Doelgroep 2"]
					}
				}
			}
		}]
	}
},
"post_filter": {
	"bool": {
		"must": [{
			"nested": {
				"path": "target_audiences",
				"query": {
					"terms": {
						"target_audiences.nl.raw": ["Doelgroep 2"]
					}
				}
			}
		}]
	}
}

}

Filters work on the results (because of the post_filter I guess?), but now none of the aggregations are filtered by eachother. I'm still doing something wrong in the "should" I guess?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.