How to know which documents in the search results include all of the tokens in the search query (for one particular field)

We are trying to find out which documents in our search results include all the tokens in our search query. For example when running the following query:

    GET /test/_search
    {
        "query" : {
          "match" : { "message" : { "query": "the brown fox" } }
        }
    }

We would need a response like this (only the hits array is included with minimal fields for brevity):

    "hits" : [
      {
        **"all_tokens_match": true**,
        "_source" : {
          "message" : "The Quick Brown Fox"
        }
      },
      {
        **"all_tokens_match": false**,
        "_source" : {
          "message" : "The Brown Bear"
        }
      }
    ]

How would you recommend we approach this kind of problem on Elasticsearch? We were looking into scripting but our field is usually indexed as a "text" and not as a "keyword" field and there seems to be some limitations with scripting in that case. In case you recommend scripting, we would be grateful for a snippet to guide us in the right direction. Thank you in advance for your support.

Check if named queries will help your problem.
If you use named queries, you would need to break your tokens into distinct queries, something like this:

GET /_search
{
    "query": {
        "bool" : {
            "should" : [
                {"match" : { "message" : {"query" : "the", "_name" : "my_query1"} }},
                {"match" : { "message" : {"query" : "brown", "_name" : "my_query2"} }},
                {"match" : { "message" : {"query" : "fox", "_name" : "my_query3"} }}
            ]
        }
    }
}

as a result you will get the following:

"hits" : [
      {
        "_index" : "my_index",
       ....
        "_source" : {
          "message" : "The Quick Brown Fox"
        },
        "matched_queries" : [
          "my_quer1",
          "my_query2",
          "my_query3"
        ]
      },
      {
        "_index" : "my_index",
          ...
        "_source" : {
          "message" : "The Brown Bear"
        },
        "matched_queries" : [
          "my_query1",
          "my_query2"
        ]
      },
1 Like

Thank you @mayya for guiding us in the right direction of named queries. It helped us tackle our problem after we combined named queries with the "minimum_should_match" parameter and set "boost": 0 to avoid it affecting the score of our main query.

We did it like this:

{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "message": {
                            "query": "the brown fox",
                            "_name": "main_query"
                        }
                    }
                },
                {
                    "match": {
                        "message": {
                            "query": "the brown fox",
                            "_name": "all_tokens_match",
                            "minimum_should_match": "100%",
                            "boost": 0
                        }
                    }
                }
            ]
        }
    }
}

So we receive a response like this:

{
    "hits": [
        {
            "_score": 0.99938476,
            "_source": {
                "message": "The Quick Brown Fox"
            },
            "matched_queries": [
                "main_query",
                "all_tokens_match"
            ]
        },
        {
            "_score": 0.38727614,
            "_source": {
                "message": "The Brown Bear"
            },
            "matched_queries": [
                "main_query"
            ]
        }
    ]
}

Documents that all tokens in our query match now have all_tokens_match included in the matched_queries part of the response.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.