Only match if all tokens of an indexed field are included in the search query in any order

SwonVIP · May 24, 2022, 3:05pm

Dear Elasticsearch Community,

currently trying to solve an use-case of a reverse AND condition on user search queries where all tokens have to be included in an indexed field in order to match a certain document.

Example:

Indexed documents have a property field with multiple tokens.

          "fieldXyz" : "quick brown fox"

The search should now only match if the user includes ALL tokens in his search request but the order should not matter.

So basically the following should NOT match:

"quick"
"quick brown"
"quick fox"
...

However these should match:

"brown quick fox"
"fox quick brown"
...

I was thinking about counting the number of tokens via token_count and adding another field on index time. However there is still the challenge of counting the tokens after analysis on query time (since stopwords and so on should be not counted). We cannot do this when processing the query on application level since the query is not analysed yet and calling the analyse API would create an additional round trip and increasing the search response time.

So is there a way to return the token count from maybe a custom analyzer and then match it to the field including the token count?

vincenbr · May 25, 2022, 8:26am

Hi !
my intuition for that would be to play with the score. Try to have a score calculated so that it can reflect a match, given your requirements.
See "similarity" for score calculations: Similarity module | Elasticsearch Guide [8.2] | Elastic
Simple example: Here we just give a score of 1 if every term in the field is in the query.
And we filter out the docs that do not have a score of 1 (with min_score)

PUT toto
{
  "settings": {
    "number_of_shards": 1,
    "similarity": {
      "scripted_basic": {
        "type": "scripted",
        "script": {
          "source": """
          return doc.freq / doc.length;
          """
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "field1": {
        "type": "text",
        "analyzer": "english", 
        "similarity": "scripted_basic"
      }
    }
  }
}


PUT toto/_doc/1
{
  "field1" : "quick brown fox"
}


GET toto/_search?explain=true
{
  "query": {
    "match": {
      "field1": "Foxes Quick Brown"
    }
  }, 
  "min_score": 1
}

system · June 22, 2022, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AND Match All Tokens in Any Order Elasticsearch	1	755	February 28, 2019
Match all fields tokens Elasticsearch	1	325	December 22, 2020
How to know which documents in the search results include all of the tokens in the search query (for one particular field) Elasticsearch	3	381	July 21, 2020
Match only? Elasticsearch	3	717	March 15, 2022
How do I build a query such that each token in a document field is matched? Elasticsearch	12	1956	July 6, 2017

Only match if all tokens of an indexed field are included in the search query in any order

Related topics