I'm having some difficulties to match words which are inside larger words.
E.g. Elasticsearch: If I search for "search" it should match
"Elasticsearch".
In German we have a lot such words, like: Seidenchiffonbluse.
Now I want to match all words with "bluse".
Now I have read a lot of examples about partial word matching using ngram,
but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.
Are there any pre-defined language settings or dictionaries inside ES?
We store many language dependent texts inside one document which look like
this:
document : { EN : { title : "english title" }, DE : { title : "german
title" }, ....}
Now I have read a lot of examples about partial word matching using
ngram, but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.
You're looking for the compound word token filter:
Am Donnerstag, 21. März 2013 16:53:04 UTC+1 schrieb Clinton Gormley:
Now I have read a lot of examples about partial word matching using
ngram, but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.
You're looking for the compound word token filter:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.