How to solve this using ElasticSearch and NLP features

Need help in modeling and solving this problem through Elastic Search and its NLP features.

We will get a query and we have to extract and out of it.

There are Products are like:
1 JNM Gold Plan
2 JNM Gold & More
3 JNM Platinum
4. KMM Platinum
5 Express Plan
6 Mac and John’s Gold Plan
7 Mac and John’s Platinum

There are Features are like :
ROI / Rate of Interest
Accelerated Rewards
USP / Milestone Rewards
Late Payment charges
Reward Points

Example of queries:
“Give me reward points for JNM gold plan”
“Tell me about express plan for Mac and John’s gold plan”
Or with typos : “points rawerds KM platanum”

The task is given say the first query “Give me reward points of JNM gold plan” and we have to figure out which and which it is mapping to. So for the given statement its mapping to <product:JNM Gold plan> and <feature:Reward Points> and that is what we need to exactly draw out.

Now complexity is added because the statement could be phrased in more than one way e.g. one can :
C1 : Different Sentence form : Write the sentence in different order e.g. “Tell JNM gold what is the rewards point”
C2: Typos and Acronmys : “Give me rwd pts for jnm gld plan”
C3: Overlapping Features : “Give me rewards point for JNM Gold Plan”, should detect feature as and not

I come from image processing background and this is a new area for me, for me it was similar to “feature extraction” image process, but so far I have attempted these and failed:

Used Phonetic matching and it failed badly as it would not do too well with C2 above is too high
Fuzzy query was good at a term level e.g if we give gld, it would match with gold, but if the entire phrase is given “give me reward points for jnm gld plan”, it would not match with “JNM Gold Plan” because of edit distance issues (i guess)

Now as my tricks are not working out, need some direction on how to go about such problems or how to map it.

Will POS tagging would help in such cases or. Can you suggest what approach should be taken to solve this problem of extracting out two category of phrases out of a search query term.

Need inputs, badly stuck with this problem.


Are users actually typing "Give me reward points of JNM gold plan"? This seems odd, what is your use case?

There are possibilities to define analyzers on search, not just on index time. So you could define a stopword/stemming analyzer during search.

What you could also do is create an additional field where the source data of the product and features are joined and use 'shingles'.

Depending on the use case you could always preprocess the input from the user, filtering out words like 'give', 'me'

For typos you can use the term and phrase suggestions, not that hard to get but you will have to spend time to tune them.

But a bit more background into the use case would help.


1 Like

Thanks a lot Maarten for responding and sharing your ideas on it, let me say, it answered many of my question.

Use case is that a person for the intended system will use regular English sentence format to get answer for his query on product and features. Unlike a more matured use, who just drops in relevant keywords to a query text.

Some of the terminology you shared is new to me, so I plan to dig deeper and try out things and then come back as needed with findings and further queries. Thanks again !

Hi Prasoon,

What you are looking for is language modelling, use name entity relations, POS tagging and maybe tensorflow rnn models for high probability returning of sentences