Filter Aggegation with a lot of aggs


(Roy Van Ginneken) #1

I have a pretty specific usecase, which I have working at the moment, but I'm just wondering if there is a better, cleaner way to do it.

  • I have 10 aggs
  • For all aggs filters can and will be applied
  • Results should be filtered by all aggs filters
  • Each agg should be filtered by all aggs filters except their own

Websites like f.e. amazon work like this.

I have now fixed this by using post_filter for the results and Filter Aggegation for the aggregations filters.
However, when all 10 filters are applied, each aggregation will have 9 filters added to them. This seems a bit silly.

In solr there is some functionality for a facet to exclude one or more facets, which can be the facet itself, thus resolving above usecase.

Is there another way to fix usecase?

Thanks for any advice!


(David Pilato) #2

You can add instead a global aggregation I believe with a sub filter aggregation may be?

HTH


(Mark Harwood) #3

There's a trick you can apply here in the query:

bool
	must : [
		query-string : [user input],
		bool
			min_should_match: 2
			should: [
				  bool:
						  should: [
								 color: blue,
								 color:green
				  bool:
						  should: [
								 size: L
				  bool:
						  should: [
								 brand: X,
								 brand: Y
			 ]
  ]

Then just have your aggs tree unfiltered.
The min-should-match clause should be one less than the number of dimensions (in the above case size/colour/brand). This ensures matches that hit a dimension in the agg tree always satisfy all other category selections.

The down-side is that the hits will match all but one of the selected criteria but you can apply a post-filter to tighten it up to match all selections. This may be doubling the original criteria but better than the alternative of each of your 9 dimensions requiring a filter with the other 8.


(Roy Van Ginneken) #4

Hi Mark,

Sorry I couldn't get around to this any sooner. It's still not entirely clear to me whats happening here? This is the regular query, but then with filters already applied? I prefer not to implent something I don't understand myself and can explain to my colleagues, since we need to expand on this later.

Is it possible (I'm really sorry for asking this) to create a complete example with regular search, aggregations and filters given the above parameters? I totally understand if this it to much of an effort.

Maybe I should ask another question: Is it bad to send as much filter aggregations as I will be doing as described in the OP? I'm not sure which impact a big search / filter query will have.


(Roy Van Ginneken) #5

Hi!

Thanks for your reply! I'm not really sure how this would solve my usecase though, besides moving the post_filters so the regular filters by wrapping them in a global aggregation?
Or is there something I'm missing here?


(Mark Harwood) #6

Working example:

// Setup sample index and data. We have 3 faceting dimensions (colour/size/brand)
DELETE test
PUT test
{
  "settings": {
	"number_of_replicas": 0,
	"number_of_shards": 1
  },
  "mappings": {
	"doc":{
	  "properties": {
		"colour":{"type":"keyword"},
		"size":{"type":"keyword"},
		"brand":{"type":"keyword"},
		"description":{"type":"text"}
	  }
	}
  }
}
POST test/doc/1
{
  "size":"large",
  "colour":"red",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/2
{
  "size":"large",
  "colour":"blue",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/3
{
  "size":"small",
  "colour":"blue",
  "brand":"brandA",
  "description":"tshirt"
}
POST test/doc/4
{
  "size":"small",
  "colour":"blue",
  "brand":"brandB",
  "description":"tshirt"
}
POST test/doc/5
{
  "size":"large",
  "colour":"blue",
  "brand":"brandB",
  "description":"tshirt"
}




// Run a search where user has made selections (in this case in all 3 dimensions)


POST test/_search
{
  "query": {
	"bool": {
	  "must": [
		{
		  "match": {
			"description": "tshirt"
		  }
		}
	  ],
	  
	  // This is set to one-less than the number of dimensions with criteria set
	  "minimum_should_match": 2, 
	  
	  "should":[
		{
		  "terms": { "colour": ["red", "blue"]}
		},
		{
		  "terms": { "size": ["large"]}
		},
		{
		  "terms": { "brand": ["brandA"]}
		}
		]
	}
  },
  
  // This is required to tighten up the hits to match ALL user choices and is a copy
  // of the user selections in a must.
  "post_filter": {
	"bool":{
	  "must": [
		{
		  "terms": { "colour": ["red", "blue"]}
		},
		{
		  "terms": { "size": ["large"]}
		},
		{
		  "terms": { "brand": ["brandA"]}
		}        
		]
	}
  }, 
  "size": 10,
  
  // The aggs tree is simple - no filters required
  "aggs": {
	"size": {
	  "terms": {
		"field": "size"
	  }
	},
	"colour": {
	  "terms": {
		"field": "colour"
	  }
	},
	"brand": {
	  "terms": {
		"field": "brand"
	  }
	}
  }
}

(Roy Van Ginneken) #7

Hello Mark,

I'm wondering, what if I want to expand my query with some more "should" queries, to search and boost on specific fields?
Let's say i want add a should search on:
title: boost 10
description: boost 5
usps: boost 1

My search results "should" match 1 of the 3 here, won't this interfere with the filters "should" queries? I should increase the "minimum_should_match" with 1, because it only needs to match one, but then again, the number of matched queries for filters won't be correct anymore.

Or should I work with nested bool queries in the main query? One for filters and one for the regular search part?


(Mark Harwood) #8

Put all that in a child bool query under the "must" clause


(Roy Van Ginneken) #9
{
"aggregations": {
	"categories": {
		"nested": {
			"path": "categories"
		},
		"aggregations": {
			"translated": {
				"terms": {
					"field": "categories.nl.raw",
					"min_doc_count": 0
				}
			}
		}
	},
	"target_audiences": {
		"nested": {
			"path": "target_audiences"
		},
		"aggregations": {
			"translated": {
				"terms": {
					"field": "target_audiences.nl.raw",
					"min_doc_count": 0
				}
			}
		}
	}
},
"query": {
	"bool": {
		"must": {
			"match_all": {}
		},
		"minimum_should_match": 0,
		"should": [{
			"nested": {
				"path": "target_audiences",
				"query": {
					"terms": {
						"target_audiences.nl.raw": ["Doelgroep 2"]
					}
				}
			}
		}]
	}
},
"post_filter": {
	"bool": {
		"must": [{
			"nested": {
				"path": "target_audiences",
				"query": {
					"terms": {
						"target_audiences.nl.raw": ["Doelgroep 2"]
					}
				}
			}
		}]
	}
}

}

Filters work on the results (because of the post_filter I guess?), but now none of the aggregations are filtered by eachother. I'm still doing something wrong in the "should" I guess?


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.