Query question using should and must

iitgrad · January 16, 2018, 8:45pm

I have documents which have a "category" : ["cat1", "cat2", "cat3"] etc. Some documents may have 20 or so different categories in the array, some may only have 1. I am trying to construct a query that does the following. Given a filter say ["cat1", "cat3", "cat5"] I would get all docs that contain all 3 of those categories (in any order and I'm not concerned if those docs also contain other categories), then the docs where only two of the 3 are included, then those docs that have any 1 of the 3 categories, and then finally followed by all documents that didn't contain any of the 3 categories in the request.

I'm getting close but my query still seems to show docs with no hits before docs with at least one hit. Any help would be greatly appreciated.

Here is my current query

"query": {
    "bool": {
      "should": [{
          "bool" : {
            "must" : [{
              "match" : {
                "category" : "cat1"
              }
            },{
              "match" : {
                "category" : "cat3"
              }
            },{
              "match" : {
                "category" : "cat5"
              }
            }]
          }
      },
      {
          "bool" : {
            "should" : [{
              "match" : {
                "category" : "cat1"
              }
            },{
              "match" : {
                "category" : "cat3"
              }
            },{
              "match" : {
                "category" : "cat5"
              }
            }]
          }
      }]
    }
  },
  "_source" : ["category"]

crickes · January 18, 2018, 7:59pm

Hi,

I think you need a minimum_should_match in your second part of the query, to make sure at least 2 of the terms get a hit.

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-minimum-should-match.html

"bool" : {
            "should" : [{
              "match" : {
                "category" : "cat1"
              }
            },{
              "match" : {
                "category" : "cat3"
              }
            },{
              "match" : {
                "category" : "cat5"
              }
            }],
            "minimum_should_match":2
          }

(not tested)

Steve

iitgrad · January 18, 2018, 9:13pm

ah, awesome, thanks Crickes, I'll give that a try.

iitgrad · January 18, 2018, 10:38pm

Ok, well that doesn't work because it starts excluding documents that don't have both matching. My real question is ultimately about sorting. I may have 25 documents with different categories and I still want to show all 25, however, I want those documents that contain both categories to score highest, then those docs with just 1 of the 2 categories score next highest, and then the rest of the docs really in any order.

For example, this query insists on ranking docs that don't contain any of the categories higher or with the same score as documents that just contain 1 of the categories.

"should": [
        {
          "match": {
            "category.keyword": "cat1"
          }
        },
        {
          "match": {
            "category.keyword": "cat3"
          }
        }
      ]

Here is a segment of the results I get back:

{
        "_index": "recommendation",
        "_type": "doc",
        "_id": "60e98770-c1b8-11e7-97b7-67350b465bf7",
        "_score": 7.0983133,
        "_source": {
          "allergens": [
            "cat1",
            "cat4",
            "cat5"
          ]
        }
      },
      {
        "_index": "recommendation",
        "_type": "doc",
        "_id": "3b8f97b0-c1bf-11e7-a56c-3d1dc97f87a0",
        "_score": 7.0610676,
        "_source": {
          "allergens": [
            "cat4",
            "cat5",
            "cat6"
          ]
        }
      },
      {
        "_index": "recommendation",
        "_type": "doc",
        "_id": "9f5cfe00-ba93-11e7-9f9a-8da6a185fd7a",
        "_score": 7.0610676,
        "_source": {
          "allergens": [
            "cat3"
          ]
        }
      },

crickes · January 19, 2018, 7:41am

Hi,

I think you need to use the 'boosting' functions in DSL for this, and possibly the constant_score_query.

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-term-query.html <- explains boost
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-constant-score-query.html

You need to boost the relevance of the documents which match 2 fields so they score more than the docs which match just one.

In your results segment above, the docs matching just one 'allergen' score the same as docs matching 3 ( "_score": 7.0610676,). If you boost the results from the 'match all 3' section by a factor of 100, then the results that match just 2 by a factor of 50, then just 1 match by a factor of 10, then the results should by properly weighted with the more relevant documents appearing first.

Wrapping each compound query in a constant_score_query, will allow you to set the relevance score for all the documents in that query to whatever value you want.

Again all theoretical and untested on my part.
Hope it helps.

iitgrad · January 19, 2018, 12:32pm

Actually what may not be apparent is that the document in the middle actually doesn't match any of the query terms yes is scored equally with the one that matches one. This just seems wrong to me given the way the query is currently structured.

system · February 16, 2018, 12:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to match multiple should clauses? Elasticsearch	6	7948	November 14, 2019
Must and should match doesn't return should matches Elasticsearch	2	326	March 28, 2023
Must and should query together Elasticsearch	2	664	July 5, 2017
Filter with multiple should and minimum_should_match Elasticsearch	2	2119	August 24, 2017
Elasticsearch multi should query with multi minimum_should_match Elasticsearch	2	520	February 21, 2020

Query question using should and must

Related topics