Hi,
We're using Elasticsearch for providing content search for users on our website, and I have a question regarding combined/compound words.
In our language (Norwegian) "combined words" are a very regular occurrence, and it is typical to make the mistake of writing the words separated rather than combined (Pretty much the opposite of English. Ex: A common mistake is writing 'hoste saft' instead of 'hostesaft').
And we would like to account for this when users are searching for hoste saft
(two words) and return hits for hostesaft
(as well as hoste saft
, I guess).
I found the compound word tokenfilter in the docs, but I'm not sure if this is what I want (Also it is pretty infeasible to provide a word list for every word that would match as there are a lot).
Is the best approach to simply check for whitespace in the querystring and add a second query (without whitespace) to our should clause to account for this?
That is our current solution and it works, but I would assume that it increases the cost of the query somewhat (2x?).
All help is appreciated