Need multiple post_filter condition


(Ferry Ardhana) #1

Hi,

I have directory listing site that use elasticsearch as search engine. In this site people able to:

  1. Find the most related doc by keyword (keyword will be checked to multiple field)
  2. sort by nearest location and limit between radius 20km

My approach is by using this json doc:
{ "query": { "bool": { "should": [ { "match": { "organic_keywords": { "query": "resto padang", "operator": "and", "boost": 0.3 } } }, { "match": { "keywords": { "query": "resto padang", "operator": "and", "boost": 0.4 } } }, { "match": { "category.id": { "query": "resto padang", "operator": "and", "boost": 0.7 } } }, { "match": { "category.en": { "query": "resto padang", "operator": "and", "boost": 0.7 } } }, { "match": { "name": { "query": "resto padang", "operator": "and", "boost": 1 } } }, { "match": { "popular_name": { "query": "resto padang", "operator": "and", "boost": 1.2 } } }, { "match": { "products.name": { "query": "resto padang", "operator": "and", "boost": 1.3 } } } ] } }, "sort": [ { "_geo_distance": { "coordinates": { "lat": "-6.244668099999999", "lon": "106.84092849999999" }, "order": "asc", "unit": "km" } } ], "post_filter": { "geo_distance": { "distance": "20km", "coordinates": { "lat": "-6.244668099999999", "lon": "106.84092849999999" } } }, "from": 0, "size": 10 }

Now, i have more filter. Only search to document with field published=1.
And the filter that i plan to used to accomplished that is by using post_filter, unfortunately as far as i know, post_filter is not accept array object.

What is the solution for this situation?

Thanks


(Martijn Van Groningen) #2

You can just add a bool query inside post_filter and inside the bool query add your geo_distance and a term (for published=1) as filter clause.

I do wonder why you use post_filter. It only makes sense in combination with aggregations, in order to show counts where certain filter are not applied. If you don't need to do that, then I recommend you to place your filters in the filter clause of the bool query that you already have defined as your main query. This way the query will execute much more efficiently.


(Ferry Ardhana) #3

Hi! Thanks for the reply. I will try it, and grt back with the result.

I use post_filter here to get document matched with keyword, which still in 20km in radius.

I'have try it (use query filter instead post_filter). If i put the geo filter on query filter clause, then i will get "not matched" with keyword document in less relevant result, right? Which we just want to result only matched keyword document.

CMIIW, anyway let me know if there is better approach for my search criteria.

Thanks


(Martijn Van Groningen) #4

Queries inside bool query's filter clause don't contribute to the score, so they don't make a document more relevant. A filter clause either does or doesn't match and a document need to match with all filter clauses otherwise it is not a hit. That behaviour is similar to what you have experienced when placing a query inside the post_filter.


(Ferry Ardhana) #5

So in my case, both filter radius and published=1 better be placed in filter clause?

If not yet tested, still on mobile.


(Martijn Van Groningen) #6

Yes, I think that would be better.


(Ferry Ardhana) #7

Hi, I've try the bool query inside post_filter it's work. thanks!

But, the interesting thing is regarding moving post_filter query to boolquery.
as i said, the result is not as expected.

The document which not contained the keyword is appear.
Do i need to create separate topic for this?

Thanks


(Martijn Van Groningen) #8

Can you share the query that you're executing?


(Ferry Ardhana) #9

Here my query.

{ "query": { "bool": { "filter": [ { "term": { "published": 1 } }, { "geo_distance": { "distance": "20km", "coordinates": { "lat": "-6.2446641", "lon": "106.8409247" } } } ], "should": [ { "match": { "organic_keywords": { "query": "hotel murah", "operator": "and", "boost": 0.3 } } }, { "match": { "keywords": { "query": "hotel murah", "operator": "and", "boost": 0.4 } } }, { "match": { "category.id": { "query": "hotel murah", "operator": "and", "boost": 0.7 } } }, { "match": { "category.en": { "query": "hotel murah", "operator": "and", "boost": 0.7 } } }, { "match": { "name": { "query": "hotel murah", "operator": "and", "boost": 1 } } }, { "match": { "popular_name": { "query": "hotel murah", "operator": "and", "boost": 1.2 } } }, { "match": { "products.name": { "query": "hotel murah", "operator": "and", "boost": 1.3 } } } ] } }, "sort": [ { "_geo_distance": { "coordinates": { "lat": "-6.2446641", "lon": "106.8409247" }, "order": "asc", "unit": "km" } } ], "from": 0, "size": 10 }

When i use this query total document hit is 98034
but when i use post_filter the document hit is 93

Thanks


(Martijn Van Groningen) #10

and can you also share the query with post_filter?


(Ferry Ardhana) #11

By using query below, document hit is 93
Please check, thanks

{ "query": { "bool": { "should": [ { "match": { "organic_keywords": { "query": "hotel murah", "operator": "and", "boost": 0.3 } } }, { "match": { "keywords": { "query": "hotel murah", "operator": "and", "boost": 0.4 } } }, { "match": { "category.id": { "query": "hotel murah", "operator": "and", "boost": 0.7 } } }, { "match": { "category.en": { "query": "hotel murah", "operator": "and", "boost": 0.7 } } }, { "match": { "name": { "query": "hotel murah", "operator": "and", "boost": 1 } } }, { "match": { "popular_name": { "query": "hotel murah", "operator": "and", "boost": 1.2 } } }, { "match": { "products.name": { "query": "hotel murah", "operator": "and", "boost": 1.3 } } } ] } }, "post_filter": { "bool": { "filter": [ { "term": { "published": 1 } }, { "geo_distance": { "distance": "20km", "coordinates": { "lat": "-6.2446641", "lon": "106.8409247" } } } ] } }, "sort": [ { "_geo_distance": { "coordinates": { "lat": "-6.2446641", "lon": "106.8409247" }, "order": "asc", "unit": "km" } } ], "from": 0, "size": 10 }


(Martijn Van Groningen) #12

Thanks for sharing that. The difference in hit count is unexpected to me. I think somehow the term query (for published=1) gets parsed differently in the two search requests.

  1. Can you execute both queries in the validate query API (/_validate/query?explain=true&rewrite=true)?

  2. Also I like to know whether the results are still the same if the match query is used instead of the term query for published=1.


(Ferry Ardhana) #13

Hi Martijn, thanks a lot for your help.

_validate endpoint doesn't support post_filter, sort, from and size it's give a and error. that's why i cannot share the output with post_filter query.

Here the outpur for filter clause:
{ "_shards": { "total": 1, "successful": 1, "failed": 0 }, "valid": true, "explanations": [ { "index": "yellowpages", "valid": true, "explanation": "(+organic_keywords:hotel +organic_keywords:murah)^0.3 (+keywords:hotel +keywords:murah)^0.4 (+category.id:hotel +category.id:murah)^0.7 (+category.en:hotel +category.en:murah)^0.7 (+name:hotel +name:murah) (+popular_name:hotel +popular_name:murah)^1.2 (+products.name:hotel +products.name:murah)^1.3 #published:[1 TO 1] #coordinates:-6.2446641,106.8409247 +/- 20000.0 meters" } ] }

Same result when trying with match query. Do you have another clue? :slight_smile:


(Martijn Van Groningen) #14

Ah sorry for recommending that. It only support queries inside a top level query json object.

Can you set minimum_should_match set 1 on the query using bool query with filter clauses?


(Ferry Ardhana) #15

Cool!! the hit count now same when i use post_filter, thanks Martijn.
What's magic behind that clause?

And what is advantages when i using query instead post_filter? Does query performance increased?

Thanks!


(Martijn Van Groningen) #16

A bool query with only should clauses automatically sets minimum_should_match to 1 if it has not been specified. A bool query with should clauses and required clauses (filter or must), the minimum_should_match is set to 0 if it has not been specified. The idea behind this is that if there are only should clauses, at one should clause must match (otherwise all documents would match). If there are required clauses then should clauses are optional by default. I had forgotten this detail :slight_smile:

Yes, Elasticsearch is then able in many cases to execute the search request more efficiently, so if you can place queries in the bool query's filter clause then you should do that.


(Ferry Ardhana) #17

Thanks @mvg for your time and help.

Thank you!


(system) #18

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.