Return documents that match a minimal number of words in the same sentence

I need to return documents that match at least N words in the same sentence.

I split my documents per sentence and index each one as a separate value like so:

PUT /test_index/_doc/id1
    {
      "texts": [
"Your first step is the subject line.",
"You will have just seconds to gain the full attention of your reader."]
    }

and leave the position_increment_gap to the default 100.

Let say I need to match a minimum of 2 words.
I need to return the document if I search for the terms ("bla", "attention", "reader") but not for ("bla", "subject", "reader"). "bla" is not in the document, "attention" and "reader" are on the same sentence, "subject" and "reader" are not.

The approach with a boolean should query and minimum_should_match does not work, as this query returns the document when it shouldn't:

"query" : {
          "bool": {
            "should": [
              {"term": {"texts": "subject"}},
              {"term": {"texts": "reader"}}
            ],
            "minimum_should_match": 2
          }
      }

So I need a way to mix proximity and minimum should match.
Is there a way to achieve that?

The nested type could be a solution here. By indexing every sentence as a nested object, you can query these sentences independently.

First, you set up a nested type in your index' mapping:

PUT test_index
{
  "mappings": {
    "properties": {
      "texts": {
        "type": "nested"
      }
    }
  }
}

Next, you index your document, using a slightly different structure:

PUT /test_index/_doc/id1
{
  "texts": [
    {
      "text": "Your first step is the subject line."
    },
    {
      "text": "You will have just seconds to gain the full attention of your reader."
    }
  ]
}

Now, you can use the nested query to get to the desired results:

GET /test_index/_search
{
  "query": {
    "nested": {
      "path": "texts",
      "query": {
        "match": {
          "texts.text": {
            "query": "bla attention reader",
            "minimum_should_match": 2
          }
        }
      }
    }
  }
}

GET /test_index/_search
{
  "query": {
    "nested": {
      "path": "texts",
      "query": {
        "match": {
          "texts.text": {
            "query": "bla subject reader",
            "minimum_should_match": 2
          }
        }
      }
    }
  }
}

Thanks Abdon, that's great. I overlooked this "nested type" feature.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.