Find documents containing not more terms than in the query with fuzzines

Let's assume there are three documents in database:

  1. {name: "John"}
  2. {name: "John Brown"}
  3. {name: "Brown"}

I would like to search documents in the following way:

  1. Search: "John". Result: document 1
  2. Search: "John Brown". Result: document 1 and document 2 and document 3
  3. Search: "Brown John". Result: document 1 and document 2 and document 3
  4. Search: "John Brown Abc". Result: document 1 and document 2 and document 3
  5. Search: "Brown Abc". Result: document 3

General scenario: I would like to find all documents that all words are included in words used in query. Query could have more words than document, but document can't have more words than used in query.

I would take a look at the match query in combination with minimum_should_match as a starter.

It sounds like you may need to use a terms set query.

I tried to use something like that, but the problem was with the first case "1. Search: "John". Result: document 1" . The "name" field Is of the text type. So the query will return all documents containing the phrase John, for example "John Brown, John Abc ...", while it should only return documents with the value "John" and nothing more. On the other hand, if the field was of type "keyword" then case 4 would not have worked.

The query that i used:

{
  "query": {
    "bool": {
      "should": [
        {"match": {"name": "John"}}
      ],
      "minimum_should_match": 1
     }
  }
}

if you require different behaviour based on the number of terms (suddenly a partial match becomes an exact match), then you need to change your query strategy based on the number of terms - Elasticsearch cannot solve this for you. You could however score the exact match higher and t hus make sure, that this will be the most relevant at the first position.

Side note for minimum_should_match. You can specify a percentage, that might help in the last case.

Thank You for this idea, but i forgot to mention, that i also would like to use fuzziness option, which is not possible with term query.

Currently, I'm looking for general sollution based on data/index structure modification (e.g. adding a counter for words in field) . I had an idea, to look for documents thats have lower or equals amount of words as query has, but You could also have document with name "John Brown Brown" and if You search just for "John Brown", You won't find it, but it fulfills the tasks requirements.

When it comes to minimum should match thank You for this advice, but when You search for "John Smith Brown" You would like to get all documents that have exactly value like "John" or "Smith" or "Brown" or "John Smith"... So this param imo should always be equal to 1.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.