Term matching with elastic search edge n gram

venkata_sreekanth_bh · February 3, 2017, 5:23pm

I have an custom analyzer with char replace filter, keyword tokenizer and lowercase filter. The problem I want solve is follows

Original string in index : 1812abcdefg

user input query : 812abcdefg

the index has other strings such as 812c, 812ab etc.

I want to be able to allow one or two characters preceding the edge ngram. I have tried ngram but the strings can of varying length from 1 to 50 characters and it throws off search.

Does anybody have any idea to do this?

polyfractal · February 3, 2017, 6:45pm

So you essentially need a suffix search? E.g. *812abcdefg?

The best way to do that is add another analysis chain, which includes a reverse filter. Then use a prefix query.

So the reverse filter will index 1812abcdefg as gfedcba2181. When you use a prefix query on that same analysis chain, 812abcdefg will be converted into gfedcba218* and you'll get your match without an expensive suffix wildcard.

venkata_sreekanth_bh · February 3, 2017, 9:27pm

I will try that and get back to you

venkata_sreekanth_bh · February 3, 2017, 9:47pm

Is there a way to perform sub-string search, while the above mentioned reverse filter is definitely a good idea some times the original string in the index can be like

1812abcdefg-pby-reel, 1812abcdefg123 etc. in this scenario the reverse filter may not work. I do use char replace filter to remove non word characters.

polyfractal · February 3, 2017, 10:26pm

You can also use ngrams / shingles with the reverse approach. So then you'll be indexing reversed fragments, which will match the prefix query. Basically the exact same analyzer you have now, except add a reverse filter to it.

Also, most people who implement this strategy also index the forward direction too, so that they get both prefix and suffix search.

venkata_sreekanth_bh · February 4, 2017, 12:02am

ngrams is reverse is likely to score other tokens higher. Shingles is not an option since there are no stop words. The terms are like part numbers and have no stop words.

I hope there is some way to search sub strings without using regex or wildcard. Also, I am not able to embed regex or wildcard in bool query in the sense that it has not effect.

forloop · February 6, 2017, 5:26am

You don't need stop words to use shingles

venkata_sreekanth_bh · February 7, 2017, 6:47pm

I am using ngram with diff min and max gram values to do substring search

system · March 7, 2017, 6:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FosElasticaBundle edgeNgram filter Elasticsearch	1	560	July 6, 2017
Reverse word order search (with shingles) Elasticsearch	5	396	August 3, 2023
Fuzzy Query With Wildcard Elasticsearch	3	2889	July 5, 2017
Searching _all field for joined and separate words using shingle Elasticsearch	4	420	July 6, 2017
[ES - 1.0.0] Trouble doing a split match with edgeNgram Elasticsearch	1	308	July 6, 2017

Term matching with elastic search edge n gram

Related topics