Looking for a phrase tokenizer or filter like this

fraf · October 5, 2022, 10:00am

Hello searchers,

I'am looking for a tokenizer (or filter) that treat the input phrase like this :

Given the input phrase : "the foxes jumping quickly"

I want to output documents, whose field may contain all theses words in whatever the order.
For instance the following phrase should match : "foo quickly bar foxes foo the bar jumping dot" because it contains all the words.

I don't see something for this in the documentation.

Thank you.

Christian_Dahlqvist · October 5, 2022, 10:09am

That sounds like pretty much the default behaviour for a field mapped as text. If you use a whitespace analyser, or maybe even the standard one (although it may remove stopwords like the), it will split your text into tokens and index these. You can then search for your string (which will be tokenized at query time) and matching documents returned. See this section in the docs for more details. If you require all your tokens to match you can use e.g. a query string query with minimum_should_match set to 100%.

maryna.cherniavska · October 5, 2022, 1:55pm

Hi @fraf , I'd support what @Christian_Dahlqvist said, but also add that you might want to use the simple analyzer instead of the standard. It will split the text into terms and will lowercase them, but it is not removing the stop words (like the, a etc). Have a look at the standard analyzers Elastics provides.

fraf · October 5, 2022, 2:51pm

You are both right ! Simple query string is the way to go ! combined with a standard tokenizer.

Thank you very much both

system · November 2, 2022, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Full text search : search phrase in text Elasticsearch	5	396	July 6, 2017
Analyzer for phrase/sub phrase searching Elasticsearch	2	392	July 6, 2017
How to tokenize in this case? Elasticsearch	2	344	July 5, 2017
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4092	January 15, 2018
Whitespace analyzer (char-filter And token-filter) Elasticsearch	7	1217	November 27, 2019

Looking for a phrase tokenizer or filter like this

Related topics