Find documents containing not more terms than in the query with fuzzines

karolix1279 · July 30, 2021, 7:28am

Let's assume there are three documents in database:

{name: "John"}
{name: "John Brown"}
{name: "Brown"}

I would like to search documents in the following way:

Search: "John". Result: document 1
Search: "John Brown". Result: document 1 and document 2 and document 3
Search: "Brown John". Result: document 1 and document 2 and document 3
Search: "John Brown Abc". Result: document 1 and document 2 and document 3
Search: "Brown Abc". Result: document 3

General scenario: I would like to find all documents that all words are included in words used in query. Query could have more words than document, but document can't have more words than used in query.

spinscale · August 2, 2021, 7:44am

I would take a look at the match query in combination with minimum_should_match as a starter.

Christian_Dahlqvist · August 2, 2021, 9:12am

It sounds like you may need to use a terms set query.

karolix1279 · August 2, 2021, 9:46am

I tried to use something like that, but the problem was with the first case "1. Search: "John". Result: document 1" . The "name" field Is of the text type. So the query will return all documents containing the phrase John, for example "John Brown, John Abc ...", while it should only return documents with the value "John" and nothing more. On the other hand, if the field was of type "keyword" then case 4 would not have worked.

The query that i used:

{
  "query": {
    "bool": {
      "should": [
        {"match": {"name": "John"}}
      ],
      "minimum_should_match": 1
     }
  }
}

spinscale · August 2, 2021, 10:14am

if you require different behaviour based on the number of terms (suddenly a partial match becomes an exact match), then you need to change your query strategy based on the number of terms - Elasticsearch cannot solve this for you. You could however score the exact match higher and t hus make sure, that this will be the most relevant at the first position.

Side note for minimum_should_match. You can specify a percentage, that might help in the last case.

karolix1279 · August 2, 2021, 4:32pm

Thank You for this idea, but i forgot to mention, that i also would like to use fuzziness option, which is not possible with term query.

karolix1279 · August 2, 2021, 5:19pm

Currently, I'm looking for general sollution based on data/index structure modification (e.g. adding a counter for words in field) . I had an idea, to look for documents thats have lower or equals amount of words as query has, but You could also have document with name "John Brown Brown" and if You search just for "John Brown", You won't find it, but it fulfills the tasks requirements.

When it comes to minimum should match thank You for this advice, but when You search for "John Smith Brown" You would like to get all documents that have exactly value like "John" or "Smith" or "Brown" or "John Smith"... So this param imo should always be equal to 1.

system · August 30, 2021, 5:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The words in the document field are included in the query text with fuzzy logic Elasticsearch	1	223	November 18, 2021
How to get result that contains every word in the query Elasticsearch	3	1664	October 21, 2019
Is it possible to search only when there are a certain number of terms in the query? Elasticsearch	3	195	September 24, 2023
How to return all documents where a string occurs in the document at least N times Elasticsearch	2	780	July 5, 2017
Terms query Elasticsearch	8	737	July 6, 2017

Find documents containing not more terms than in the query with fuzzines

Related topics