Exact match in query

We currently use the following query:


  "query": {
    "bool": {
      "should": [],
      "must": [
        {
          "multi_match": {
            "query": "but not both",
            "type": "phrase",
            "fields": [
              "field_1",
              "field_2",
              "field_3^1.5",
              "source_id^4.0",
              "ui_ref^4.0"
            ]
          }
        }
      ],
      "filter": {
        "terms": {
          "index_group": [
             // indexes
          ]
        }
      }
    }
  }
}

and we want to modify it to obtain only exact matches of "but not both". We want the match to be case insensitive, and ensure the words appear exactly in that order - i.e. "yes, but not both instances" should be a match while "but, you see, both images do not" should not. How to modify it?

Hey @mariana.upcodes :

You can check the match phrase query. It takes into account the position for each of the words.

For case insensitive matches, I'd suggest you use the lowercase token filter and include it in the analyzers used for your fields.

I hope that helps!

1 Like

Hey Carlos, thanks for your answer! I tried doing a simple test with match_phrase as you suggested but it's not working as expected. I get a lot of results but not a single matches the "but not both", which is what I'm trying to do

Please show exactly what you tried as well as what was and was not returned as expected. It would be great if you also could include the mappings of the fields involved in the query.

Tried this query:

{
  "query": {
    "bool": {
      "should": [],
      "must": [
        {
          "match_phrase": {
            "body": "but not both"
          }
        }
      ],
      "filter": {
        "terms": {
          "index_group": [
              // indexes here
           ]
        }
      }
    }
  }
}

As for the examples, I think i cannot disclose. But basically, I only want instances in which "but not both" appears exactly as is in the text.

This is the mapping of the field I was testing:

"body": {
          "type": "text",
          "fields": {
            "simple": {
              "type": "text",
              "analyzer": "simple"
            }
          }
        }

could it have something to do with that?

If you can not show the actual documents, please create a made up a query and sample document(s) that can be used to recreate and showcase the issue.

I already gave the example query and the actual mapping in my previous comment!

This is a sample document I'm getting in the results, but I don't want it to appear:

{
  "_source": {
    "body": "[...] on both [...] in both [...] on both [...] not [...] on both [...]",
    "ui_ref": "407.2.3",
    "source_id": "407.2.3"
  }
}

So, it basically has some apparitions of "both" and one of "not", but they're not together as expected, nor we have any apparition of "but"

That's a red flag to me. It's surely not that hard to remove any sensitive information from a small sample of your documents, or create a few sample dummy documents.

1 Like

Please create a small dummy document with some text in the body field that can be used to demonstrate the issue. When you blank out text in a string the way you have done it is impossible for us to analyse or recreate the issue. For all we know the matching component may exist in the parts you have commented out...

When we have access to a dummy document and associated query that replicates the problem we can see how it is analysed and better help. Without this it is a lot of time consuming guesswork and a waste of time.

I ran the following example, and the query did not find the document.

PUT /test
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "fields": {
          "simple": {
            "type": "text",
            "analyzer": "simple"
          }
        }
      }
    }
  }
}

POST /test/_doc/1
{
  "body": "default on both default in both default on both default not default on both default"
}

GET /test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "body": "but not both"
          }
        }
      ]
    }
  }
}

Please create a minimal example like this that clearly demonstrates the issue.

Ok, I'm trying to create a minimal example as suggested, but even if I create the exact same document in a new index, the query doesn't get the document :thinking: Are there any other components here that could be affecting the result besides the mapping?

EDIT: I just found the problem lied with the analyzer. It was removing "but" and "not" since they are stopwords. If I use "body.simple"in the query, it works as expected. Thank you so much for your time!

Mmm, to my understanding, the analyzer was working completely consistently with its own documentation. So personally, I think this mis-states where the problem was ...

But I am certainly glad you now understand things better.

1 Like

I tested the example query against both fields in my example and the document was not found with either analyser. It therefore seems you were sloppy and posted an incorrect mapping when asked for it and therefore wasted everyones time. If we had seen that an analyser which removes stopwords was being used the example and the result would have been clear immediately.

This is why you should ALWAYS condense the problem down to a minimal, full and reproducible example. Often that also helps you find the issue.

1 Like