Only match if all tokens of an indexed field are included in the search query in any order

Dear Elasticsearch Community,

currently trying to solve an use-case of a reverse AND condition on user search queries where all tokens have to be included in an indexed field in order to match a certain document.

Example:

Indexed documents have a property field with multiple tokens.

          "fieldXyz" : "quick brown fox"

The search should now only match if the user includes ALL tokens in his search request but the order should not matter.

So basically the following should NOT match:

"quick"
"quick brown"
"quick fox"
...

However these should match:

"brown quick fox"
"fox quick brown"
...

I was thinking about counting the number of tokens via token_count and adding another field on index time. However there is still the challenge of counting the tokens after analysis on query time (since stopwords and so on should be not counted). We cannot do this when processing the query on application level since the query is not analysed yet and calling the analyse API would create an additional round trip and increasing the search response time.

So is there a way to return the token count from maybe a custom analyzer and then match it to the field including the token count?

Hi !
my intuition for that would be to play with the score. Try to have a score calculated so that it can reflect a match, given your requirements.
See "similarity" for score calculations: Similarity module | Elasticsearch Guide [8.2] | Elastic
Simple example: Here we just give a score of 1 if every term in the field is in the query.
And we filter out the docs that do not have a score of 1 (with min_score)

PUT toto
{
  "settings": {
    "number_of_shards": 1,
    "similarity": {
      "scripted_basic": {
        "type": "scripted",
        "script": {
          "source": """
          return doc.freq / doc.length;
          """
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text",
        "analyzer": "english", 
        "similarity": "scripted_basic"
      }
    }
  }
}


PUT toto/_doc/1
{
  "field1" : "quick brown fox"
}


GET toto/_search?explain=true
{
  "query": {
    "match": {
      "field1": "Foxes Quick Brown"
    }
  }, 
  "min_score": 1
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.