I need to return documents that match at least N words in the same sentence.
I split my documents per sentence and index each one as a separate value like so:
PUT /test_index/_doc/id1
{
"texts": [
"Your first step is the subject line.",
"You will have just seconds to gain the full attention of your reader."]
}
and leave the position_increment_gap to the default 100.
Let say I need to match a minimum of 2 words.
I need to return the document if I search for the terms ("bla", "attention", "reader") but not for ("bla", "subject", "reader"). "bla" is not in the document, "attention" and "reader" are on the same sentence, "subject" and "reader" are not.
The approach with a boolean should query and minimum_should_match does not work, as this query returns the document when it shouldn't:
Next, you index your document, using a slightly different structure:
PUT /test_index/_doc/id1
{
"texts": [
{
"text": "Your first step is the subject line."
},
{
"text": "You will have just seconds to gain the full attention of your reader."
}
]
}
Now, you can use the nested query to get to the desired results:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.