Looking for a phrase tokenizer or filter like this

Hello searchers,

I'am looking for a tokenizer (or filter) that treat the input phrase like this :

Given the input phrase : "the foxes jumping quickly"

I want to output documents, whose field may contain all theses words in whatever the order.
For instance the following phrase should match : "foo quickly bar foxes foo the bar jumping dot" because it contains all the words.

I don't see something for this in the documentation.

Thank you.

That sounds like pretty much the default behaviour for a field mapped as text. If you use a whitespace analyser, or maybe even the standard one (although it may remove stopwords like the), it will split your text into tokens and index these. You can then search for your string (which will be tokenized at query time) and matching documents returned. See this section in the docs for more details. If you require all your tokens to match you can use e.g. a query string query with minimum_should_match set to 100%.

Hi @fraf , I'd support what @Christian_Dahlqvist said, but also add that you might want to use the simple analyzer instead of the standard. It will split the text into terms and will lowercase them, but it is not removing the stop words (like the, a etc). Have a look at the standard analyzers Elastics provides.

1 Like

You are both right ! Simple query string is the way to go ! combined with a standard tokenizer.

Thank you very much both

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.