Exact match search with permutation of words

Hi guys

I am kinda stuck with a problem in elasticsearch. I want to apply exact match search with a permutation of words.

Suppose I have a doc with text "paper boat" & I want to apply exact search in such a way that it should come in response when query with "paper boat" & "boat paper"

More examples

           **Text**                  **valid search values**
>         "hell boy"                "hell boy", "boy hell"
>         "apple pie"               "pie apple", "apple pie"

Is there any analyzer or anything to solve this problem. Please help me out on this.

This is the default behavior unless I don't understand the problem.

In my understanding in default behaviour i will get result of "apple pie" from "apple" and "pie" search also but i want exact string match such that "pie apple" & "apple pie" should be only valid search strings for "apple pie" text ("apple" and "pie" not valid for "apple pie" in my case).

I believe you can use a phrase search with slop = 2. See https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-match-query-phrase.html

It will return the response with a single token search also for example-
Text "apple pie" will break into two tokens ["apple" , "pie"] & if I trying to search using match_phrase query with any of the token("apple" or "pie"), it will return the text "apple pie" which is not valid for my use-case. Only valid strings for search in my use-case are ["apple pie", "pie apple"] for text "apple pie".

In my use-case, all tokens of text are mandatory for the search but tokens can be in reverse order or any order.

You can split the texts into individual tokens and use a must query with all these tokens

But it must come in response only when all tokens match not any.

means for keywords - "apple", "pie"

valid strings: "apple pie is good", "good apple pie"

invalid strings: "pie is bad", "apple is bad"

When you use two conditions(one for apple and one for pie) in a must it needs both query be matched

yes but if i have only one of the term either "apple" or "pie" in search string then also it ("apple", "pie") will be matched which I don't want.

My condition is that when search string contains both terms ("apple" and "pie") only then it should match.

e.g.

for "apple pie" indexed terms

when I search for string "apple is good" it should not matched cause it does not contains both terms "apple" & "pie".

But for search string "apple is good pie" it should match cause it is having both "apple" & "pie" .

But in your last example your documents should not match because they don't contain "is" and "good".

No, It should be matched since my string contains both terms of the indexed document which is "apple" & "pie".

e.g.
"apple pie" - is indexed doc in ES
{"search_term": "apple pie", "category": "apple product"} - indexed doc

1.query:{
match:{"search_term": "apple is good pie"}
}
2.query:{
match:{"search_term": "apple is good"}
}

1 - is valid since it contains both apple & pie
2- Not valid since not contains both

How elasticsearch can know that apple and pie are terms you are looking for but is and good are not? Are you going to provide a list of stopwords?

Do you mean that all the terms in the document must match terms provided by the user?

No, I will not provide any list of stopwords.

Yes if all the terms of any document contained in user terms it should return in the response.

e.g.
1-"apple pie" - is indexed doc in ES
{"search_term": "apple pie", "category": "apple product"} - indexed doc

2-"banana pie" - is indexed doc in ES
{"search_term": "banana pie", "category": "banana product"} - indexed doc

3-"banana shake" - is indexed doc in ES
{"search_term": "banana shake", "category": "banana product"} - indexed doc

Query-
{
query:{match:{"search_term": "I like apple banana pie"}}
}

I only want to return 1st & 2nd docs cause user terms contains all 1st(apple pie) & 2nd(banana pie) docs terms but not 3rd doc since user terms only contain "banana" not "shake".

@jpountz any idea?

Hi @jpountz

Please help me out on this if you have any idea.

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

Apologies for being impatience & I will keep this in mind. Thank you for info :grinning:

I see, you want the query terms to be a superset of the indexed terms. We support this but it requires some manual work. first you need to index your documents as an array and a separate field must record the number of terms, eg.

{
  "tokens": [ "apple", "pie" ],
  "token_count": 2
}

Then at search time you can use the terms_set query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html):

{
  "query": {
    "terms_set": {
      "tokens": {
        "terms": [ "apple", "banana", "pie" ],
        "minimum_should_match_field": "token_count"
      }
    }
  }
}

Note that tokens must be mapped as a keyword field.

2 Likes

Thanks a lot, @jpountz. Your reply is a lifesaver :heart_eyes:

Thanks again.:grinning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.