Do elasticsearch doument participate in all OR conditions in bool query.?

san88922 · May 23, 2018, 5:12pm

I am new to Elasticsearch, And we are working on a requirement,where document in the elasticsearch will be fetched based on the match types like fuzzyMatch,Wordmatch etc...

We are facing performance issues here, when document is matched for one match type, document should come out of the "bool" query, but it is participating in all match types and giving me the all match types result, I tried nested bool query to make it participate in single match types and exit when matched else move to next match types.

I tried following this blog.

Even this is giv me the same Result..

Please find the below sample code.

 {
  "size": 10000,
  "query": {
    "function_score": {
      "score_mode": "sum",
      "functions": [
        {
          "filter": {
            "bool": {
              "should": [
                {
                  "bool": {
                    "should": [
                      {
                        "match": {
                          "4": {
                            "fuzziness": "AUTO",
                            "query": "kavan",
                            "minimum_should_match": "50%",
                            "_name": "4.fuzzyMatch"
                          }
                        }
                      },
                      {
                        "match": {
                          "4": {
                            "query": "kavan",
                            "minimum_should_match": "50%",
                            "_name": "4.wordMatch"
                          }
                        }
                      }
                    ]
                  }
                },
                {
                  "bool": {
                    "should": [
                      {
                        "match": {
                          "4.subStringMatch": {
                            "query": "kavan",
                            "minimum_should_match": "50%",
                            "_name": "4.subStringMatch"
                          }
                        }
                      },
                      {
                        "match": {
                          "4.phoneticMatch": {
                            "query": "kavan",
                            "minimum_should_match": "50%",
                            "_name": "4.phoneticMatch"
                          }
                        }
                      }
                    ]
                  }
                },
                {
                  "match": {
                    "4.exactMatch": "kavan"
                  }
                }
              ]
            }
          },
          "weight": 50
        }
      ]
    }
  }
}

Actual output is:

"hits": {
    "total": 94,
    "max_score": 50.0,
    "hits": [{
      "_index": "6_1028",
      "_type": "6_1028",
      "_id": "14",
      "_score": 50.0,
      "_source": {
        "1": 14,
        "@timestamp": "2018-05-18T06:57:02.540Z",
        "4": "kavana",
        "6": "Vastrad",
        "7": "Indiranagar",
        "@version": "1",
        "type": "1028",
        "10": "Bangalore"
      },
      "matched_queries": ["4.phoneticMatch", "4.fuzzyMatch", "4.subStringMatch"]
    }, {
      "_index": "6_1028",
      "_type": "6_1028",
      "_id": "52",
      "_score": 50.0,
      "_source": {
        "1": 52,
        "@timestamp": "2018-05-18T06:57:02.559Z",
        "4": "kavana",
        "6": "Vastrad",
        "7": "Indiranagar",
        "@version": "1",
        "type": "1028",
        "10": "Bangalore"
      },
      "matched_queries": ["4.phoneticMatch", "4.fuzzyMatch", "4.subStringMatch"]
    }, {
      "_index": "6_1028",
      "_type": "6_1028",
      "_id": "53",
      "_score": 50.0,
      "_source": {
        "1": 53,
        "@timestamp": "2018-05-18T06:57:02.559Z",
        "4": "kavana",
        "6": "Vastrad",
        "7": "Indiranagar",
        "@version": "1",
        "type": "1028",
        "10": "Bangalore"
      },
      "matched_queries": ["4.phoneticMatch", "4.fuzzyMatch", "4.subStringMatch"]
    }
}

List item

Expected output is:

     "hits": {
        "total": 94,
        "max_score": 50.0,
        "hits": [{
          "_index": "6_1028",
          "_type": "6_1028",
          "_id": "14",
          "_score": 50.0,
          "_source": {
            "1": 14,
            "@timestamp": "2018-05-18T06:57:02.540Z",
            "4": "kavana",
            "6": "Vastrad",
            "7": "Indiranagar",
            "@version": "1",
            "type": "1028",
            "10": "Bangalore"
          },
          "matched_queries": ["4.phoneticMatch"]
        }

With only one match type in matched_queries

Note : Match types can be seen in matched_queries block.

Any suggestions in this regard are appreciated..

polyfractal · May 23, 2018, 7:35pm

Is the issue that the query is slow? Or that too many documents are matching the query?

If the query is too slow, I'd recommend not using the Function Score query. You aren't using any of the functions of the function score, so the entire assembly of boolean queries can be expressed without the function_score. Which will be much faster.

If your query is too permissive -- it matches too many documents -- you need to change how the query is constructed. Right now you have a large amount of should clauses, which are basically optional clauses. A document can match one or more of the clauses, but may match multiple.

san88922 · May 24, 2018, 2:36am

yes @polyfractal We are seeing performance issue, we are using functions of the function score that i have not posted here, Problem is each document is performing all matches .i.e its going through all the match clauses, which is a overload, we want only one match type in the should clause to appear in the result, so document should not participate in the remaining match clause when it find one true match clause.

Can you please suggest a solution, where match clauses in the should clause are executed on condition?

polyfractal · May 24, 2018, 2:37pm

If you only want the first matching query to apply, you can set the function_score's score_mode to first. You'll need to re-arrange so that there is a single list of filters, instead of the nested boolean setup you have now.

There is no way to tell a boolean query to only match the first clause. They are not designed to do that... they will try to match all the clauses according to the boolean logic (must/must_not/should).

Note: if you only want the first to match purely for performance reasons, I think this is misguided. Internally, Lucene uses a leapfrog iterator model, where the sparsest query clause iterates forward, the next query advances it's iterator to the same or farther position, etc. Only when all the iterators line up is a match recorded.

So by attempting to only allow one query to match, you may be making performance worse.

If you don't need that behavior for functional reasons, I would suggest just removing the function_score entirely. It is much slower than a simple bool query.

system · June 21, 2018, 2:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.