Let's assume there are three documents in database:
{name: "John"}
{name: "John Brown"}
{name: "Brown"}
I would like to search documents in the following way:
Search: "John". Result: document 1
Search: "John Brown". Result: document 1 and document 2 and document 3
Search: "Brown John". Result: document 1 and document 2 and document 3
Search: "John Brown Abc". Result: document 1 and document 2 and document 3
Search: "Brown Abc". Result: document 3
General scenario: I would like to find all documents that all words are included in words used in query. Query could have more words than document, but document can't have more words than used in query.
I tried to use something like that, but the problem was with the first case "1. Search: "John". Result: document 1" . The "name" field Is of the text type. So the query will return all documents containing the phrase John, for example "John Brown, John Abc ...", while it should only return documents with the value "John" and nothing more. On the other hand, if the field was of type "keyword" then case 4 would not have worked.
if you require different behaviour based on the number of terms (suddenly a partial match becomes an exact match), then you need to change your query strategy based on the number of terms - Elasticsearch cannot solve this for you. You could however score the exact match higher and t hus make sure, that this will be the most relevant at the first position.
Side note for minimum_should_match. You can specify a percentage, that might help in the last case.
Currently, I'm looking for general sollution based on data/index structure modification (e.g. adding a counter for words in field) . I had an idea, to look for documents thats have lower or equals amount of words as query has, but You could also have document with name "John Brown Brown" and if You search just for "John Brown", You won't find it, but it fulfills the tasks requirements.
When it comes to minimum should match thank You for this advice, but when You search for "John Smith Brown" You would like to get all documents that have exactly value like "John" or "Smith" or "Brown" or "John Smith"... So this param imo should always be equal to 1.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.