Exact match on fields of type "text" (beginning and end "anchored")

Hi

Is there a way to do an exact match on fields of type "text"? I mean ensure that all tokens are matched in the order specified and no additional tokens are contained in search result documents (no partial match)? I know the "keyword" type could help here, but that is not wanted/available here. I already searched a bit and found things like https://stackoverflow.com/questions/30517904/elasticsearch-exact-matches-on-analyzed-fields and Exact match search on text field. There must be some trick do do this. Am I missing something?

Consider this example. Only two results should be returned. This example returns all three documents. A solution with Query DSL is also fine.

PUT ypid-exact-match-of-text-test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}
POST ypid-exact-match-of-text-test/_doc
{
  "text": "You Know, for Search"
}
POST ypid-exact-match-of-text-test/_doc
{
  "text": "Elastic Stack"
}
POST ypid-exact-match-of-text-test/_doc
{
  "text": "You Know, for Search (Elastic Stack)"
}
GET ypid-exact-match-of-text-test/_search?filter_path=hits.hits._source.text
{
  "query": {
    "query_string": {
      "query": """text:("You Know, for Search" OR "Elastic Stack")"""
    }
  }
}

I also tried to use min_score which would have been a workaround, but the default scoring is not suitable for that. It seems even with Painless, it is not trivial to do this as Painless would also benefit from type keyword.

Hi @ypid-geberit,

I think a solution for your use case would be by using a normalizer that is a kind of analyzer for keyword, and it produces a single token at the end. https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html

If you really don't want to use keyword for your use case, in the book Relevant Search by Doug Turnbull and John Berryman, they proposes an interesting solution for that question that are the sentinel tokens. You add tokens in the boundaries of the text even in the ingest as in the search part.

PUT ypid-exact-match-of-text-test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

POST ypid-exact-match-of-text-test/_doc
{
  "text": "SENTINEL_BEGIN You Know, for Search SENTINEL_END"
}

POST ypid-exact-match-of-text-test/_doc
{
  "text": "SENTINEL_BEGIN Elastic Stack SENTINEL_END"
}

POST ypid-exact-match-of-text-test/_doc
{
  "text": "SENTINEL_BEGIN You Know, for Search (Elastic Stack) SENTINEL_END"
}

The search part would also have the sentinel tokens. In order to match exactly the phase you would have to use match phase query along with boolean queries, like the example below.

GET ypid-exact-match-of-text-test/_search?filter_path=hits.hits._source.text
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "text": "SENTINEL_BEGIN You Know, for Search SENTINEL_END"
          }
        },
        {
          "match_phrase": {
            "text": "SENTINEL_BEGIN Elastic Stack SENTINEL_END"
          }
        }
      ]
    }
  }
}
1 Like

Thanks for the clarification. So I am not missing anything. Some form of preparation at index time is needed so that this query can be answered. The trick with sentinel tokens is interesting but probably not a good idea to do by default for the typical logging use case (where Kibana is used).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.