I'am looking for a tokenizer (or filter) that treat the input phrase like this :
Given the input phrase : "the foxes jumping quickly"
I want to output documents, whose field may contain all theses words in whatever the order.
For instance the following phrase should match : "foo quickly bar foxes foo the bar jumping dot" because it contains all the words.
I don't see something for this in the documentation.
That sounds like pretty much the default behaviour for a field mapped as text. If you use a whitespace analyser, or maybe even the standard one (although it may remove stopwords like the), it will split your text into tokens and index these. You can then search for your string (which will be tokenized at query time) and matching documents returned. See this section in the docs for more details. If you require all your tokens to match you can use e.g. a query string query with minimum_should_match set to 100%.
Hi @fraf , I'd support what @Christian_Dahlqvist said, but also add that you might want to use the simple analyzer instead of the standard. It will split the text into terms and will lowercase them, but it is not removing the stop words (like the, a etc). Have a look at the standard analyzers Elastics provides.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.