Need Some Help Understanding Match Query Behavior

denvaar · January 11, 2023, 6:14am

I'm confused why the match query seen below is matching two documents rather than just one. I thought using the "and" operator would require all terms to be present in order for it to match.

When I hit the explain endpoint (GET people/_explain/2) with the id of the document I do not expect to be there, I see the description mentioning synonyms, which seems unexpected to me.

weight(Synonym(email:john email:john.smith email:smith) in 1) [PerFieldSimilarity]

Why is tom.smith@gmail.com showing up in the results?

DELETE people

PUT people
{
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "filter": [
            "email_filter",
            "lowercase",
            "unique"
          ],
          "tokenizer": "standard"
        }
      },
      "filter": {
        "email_filter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([^@]+)",
            """(\p{L}+)""",
            """(\d+)""",
            "@(.+)"
          ]
        }
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "people", "_id" : "1" } }
{ "email" : "john.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "2" } }
{ "email" : "tom.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.wozowski@gmail.com" }

GET people/_analyze
{
  "text": "tom.smith@gmail.com",
  "field": "email"
}


GET people/_search
{
  "query": {
    "match": {
      "email": {"query": "john.smith", "operator": "and"}
    }
  }
}

denvaar · January 11, 2023, 7:04am

Well, I found some documentation as to why it's doing this, but not sure what the best way forward is. I'd like it to behave as if the "and" operator works like normally.

https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-pattern-capture-tokenfilter.html

Note: All tokens are emitted in the same position, and with the same character offsets. This means, for example, that a match query for john-smith_123@foo-bar.com that uses this analyzer will return documents containing any of these tokens, even when using the and operator. Also, when combined with highlighting, the whole original token will be highlighted, not just the matching subset. For instance, querying the above email address for "smith" would highlight:

denvaar · January 11, 2023, 7:10am

Changing to this seems like it will suit my needs.

DELETE people

PUT people
{
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "email_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "filter": [
            "lowercase",
            "email_parts_filter",
            "3_6_edge_ngram",
            "unique"
          ],
          "tokenizer": "standard"
        }
      },
      "filter": {
        "email_parts_filter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([^@]+)",
            "@(.+)"
          ]
        },
        "3_6_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 6
        }
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "people", "_id" : "1" } }
{ "email" : "john.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "2" } }
{ "email" : "tom.smith@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.wozowski@gmail.com" }
{ "index" : { "_index" : "people", "_id" : "3" } }
{ "email" : "mike.smith-666@gmail.com" }

GET people/_analyze
{
  "text": "tom.smith@gmail.com",
  "field": "email"
}


GET people/_search
{
  "query": {
    "match": {
      "email": {"query": "john.sm", "operator": "and"}
    }
  }
}

system · February 8, 2023, 7:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Match query with "the" text parsed using "OR" even specified "AND" in the operator field Elasticsearch	2	342	July 6, 2017
How do you do a match query with and operator? Elasticsearch	2	7956	September 15, 2017
Weird match query behaviour Elasticsearch	2	381	December 3, 2018
Unexpected Behavior of OR Match Query With Synonym Graph Elasticsearch	4	188	November 1, 2023
Increase score for unique words matched Elasticsearch	2	677	March 18, 2020

Need Some Help Understanding Match Query Behavior

Related topics