Need Some Help Understanding Match Query Behavior

I'm confused why the match query seen below is matching two documents rather than just one. I thought using the "and" operator would require all terms to be present in order for it to match.

When I hit the explain endpoint (GET people/_explain/2) with the id of the document I do not expect to be there, I see the description mentioning synonyms, which seems unexpected to me.

weight(Synonym(email:john email:john.smith email:smith) in 1) [PerFieldSimilarity]

Why is tom.smith@gmail.com showing up in the results?

DELETE people

PUT people
{
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "filter": [
            "email_filter",
            "lowercase",
            "unique"
          ],
          "tokenizer": "standard"
        }
      },
      "filter": {
        "email_filter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([^@]+)",
            """(\p{L}+)""",
            """(\d+)""",
            "@(.+)"
          ]
        }
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "people", "_id" : "1" } }
{ "email" : "john.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "2" } }
{ "email" : "tom.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.wozowski@gmail.com" }

GET people/_analyze
{
  "text": "tom.smith@gmail.com",
  "field": "email"
}


GET people/_search
{
  "query": {
    "match": {
      "email": {"query": "john.smith", "operator": "and"}
    }
  }
}

Well, I found some documentation as to why it's doing this, but not sure what the best way forward is. I'd like it to behave as if the "and" operator works like normally.

https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-pattern-capture-tokenfilter.html

Note: All tokens are emitted in the same position, and with the same character offsets. This means, for example, that a match query for john-smith_123@foo-bar.com that uses this analyzer will return documents containing any of these tokens, even when using the and operator. Also, when combined with highlighting, the whole original token will be highlighted, not just the matching subset. For instance, querying the above email address for "smith" would highlight:

Changing to this seems like it will suit my needs.

DELETE people

PUT people
{
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "filter": [
            "lowercase",
            "email_parts_filter",
            "3_6_edge_ngram",
            "unique"
          ],
          "tokenizer": "standard"
        }
      },
      "filter": {
        "email_parts_filter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([^@]+)",
            "@(.+)"
          ]
        },
        "3_6_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 6
        }
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "people", "_id" : "1" } }
{ "email" : "john.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "2" } }
{ "email" : "tom.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.wozowski@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.smith-666@gmail.com" }

GET people/_analyze
{
  "text": "tom.smith@gmail.com",
  "field": "email"
}


GET people/_search
{
  "query": {
    "match": {
      "email": {"query": "john.sm", "operator": "and"}
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.