N-gram Tokenizer by words

David_Han · March 13, 2020, 8:18am

In elasticsearch , the n-gram means divide the string by letter.
For example , "this is a dog" will divide to ['t','th','thi','this' .....etc].
Is there a way to do n-gram by two words.
For example , "this is a dog" will divide to ['this is','is a','a dog'].

dadoonet · March 13, 2020, 9:26am

Welcome!

Have a look at shingles: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html

David_Han · March 14, 2020, 7:16am

Thank you so mush.
That's what I need.

system · April 11, 2020, 7:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Edge N-gram at word level Elasticsearch	3	386	August 17, 2021
Conditional N-gram token filter? Elasticsearch	1	433	March 16, 2021
Tokenizer to get combinations of words Elasticsearch	2	856	November 14, 2018
Word matching (partial and full) Elasticsearch	5	1427	July 5, 2017
How to find matches based on Longest Common N-gram Elastic Search	1	111	May 16, 2024

N-gram Tokenizer by words

Related Topics