Exact match search with permutation of words


(Ankur Singla) #1

Hi guys

I am kinda stuck with a problem in elasticsearch. I want to apply exact match search with a permutation of words.

Suppose I have a doc with text "paper boat" & I want to apply exact search in such a way that it should come in response when query with "paper boat" & "boat paper"

More examples

           **Text**                  **valid search values**
>         "hell boy"                "hell boy", "boy hell"
>         "apple pie"               "pie apple", "apple pie"

Is there any analyzer or anything to solve this problem. Please help me out on this.


(David Pilato) #2

This is the default behavior unless I don't understand the problem.


(Ankur Singla) #3

In my understanding in default behaviour i will get result of "apple pie" from "apple" and "pie" search also but i want exact string match such that "pie apple" & "apple pie" should be only valid search strings for "apple pie" text ("apple" and "pie" not valid for "apple pie" in my case).


(David Pilato) #4

I believe you can use a phrase search with slop = 2. See https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-match-query-phrase.html


(Ankur Singla) #5

It will return the response with a single token search also for example-
Text "apple pie" will break into two tokens ["apple" , "pie"] & if I trying to search using match_phrase query with any of the token("apple" or "pie"), it will return the text "apple pie" which is not valid for my use-case. Only valid strings for search in my use-case are ["apple pie", "pie apple"] for text "apple pie".

In my use-case, all tokens of text are mandatory for the search but tokens can be in reverse order or any order.


(Farshad N. Shoushtari) #6

You can split the texts into individual tokens and use a must query with all these tokens


(Ankur Singla) #7

But it must come in response only when all tokens match not any.

means for keywords - "apple", "pie"

valid strings: "apple pie is good", "good apple pie"

invalid strings: "pie is bad", "apple is bad"


(Farshad N. Shoushtari) #8

When you use two conditions(one for apple and one for pie) in a must it needs both query be matched


(Ankur Singla) #9

yes but if i have only one of the term either "apple" or "pie" in search string then also it ("apple", "pie") will be matched which I don't want.

My condition is that when search string contains both terms ("apple" and "pie") only then it should match.

e.g.

for "apple pie" indexed terms

when I search for string "apple is good" it should not matched cause it does not contains both terms "apple" & "pie".

But for search string "apple is good pie" it should match cause it is having both "apple" & "pie" .


(David Pilato) #10

But in your last example your documents should not match because they don't contain "is" and "good".


(Ankur Singla) #11

No, It should be matched since my string contains both terms of the indexed document which is "apple" & "pie".

e.g.
"apple pie" - is indexed doc in ES
{"search_term": "apple pie", "category": "apple product"} - indexed doc

1.query:{
match:{"search_term": "apple is good pie"}
}
2.query:{
match:{"search_term": "apple is good"}
}

1 - is valid since it contains both apple & pie
2- Not valid since not contains both


(David Pilato) #12

How elasticsearch can know that apple and pie are terms you are looking for but is and good are not? Are you going to provide a list of stopwords?

Do you mean that all the terms in the document must match terms provided by the user?


(Ankur Singla) #13

No, I will not provide any list of stopwords.

Yes if all the terms of any document contained in user terms it should return in the response.

e.g.
1-"apple pie" - is indexed doc in ES
{"search_term": "apple pie", "category": "apple product"} - indexed doc

2-"banana pie" - is indexed doc in ES
{"search_term": "banana pie", "category": "banana product"} - indexed doc

3-"banana shake" - is indexed doc in ES
{"search_term": "banana shake", "category": "banana product"} - indexed doc

Query-
{
query:{match:{"search_term": "I like apple banana pie"}}
}

I only want to return 1st & 2nd docs cause user terms contains all 1st(apple pie) & 2nd(banana pie) docs terms but not 3rd doc since user terms only contain "banana" not "shake".


(David Pilato) #14

@jpountz any idea?


(Ankur Singla) #15

Hi @jpountz

Please help me out on this if you have any idea.


(David Pilato) #16

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.


(Ankur Singla) #17

Apologies for being impatience & I will keep this in mind. Thank you for info :grinning:


(Adrien Grand) #18

I see, you want the query terms to be a superset of the indexed terms. We support this but it requires some manual work. first you need to index your documents as an array and a separate field must record the number of terms, eg.

{
  "tokens": [ "apple", "pie" ],
  "token_count": 2
}

Then at search time you can use the terms_set query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html):

{
  "query": {
    "terms_set": {
      "tokens": {
        "terms": [ "apple", "banana", "pie" ],
        "minimum_should_match_field": "token_count"
      }
    }
  }
}

Note that tokens must be mapped as a keyword field.


(Ankur Singla) #19

Thanks a lot, @jpountz. Your reply is a lifesaver :heart_eyes:

Thanks again.:grinning: