Best practices to match subwords in foreign languages

Hi,

I'm having some difficulties to match words which are inside larger words.
E.g. Elasticsearch: If I search for "search" it should match
"Elasticsearch".
In German we have a lot such words, like: Seidenchiffonbluse.
Now I want to match all words with "bluse".

Now I have read a lot of examples about partial word matching using ngram,
but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.

Are there any pre-defined language settings or dictionaries inside ES?
We store many language dependent texts inside one document which look like
this:
document : { EN : { title : "english title" }, DE : { title : "german
title" }, ....}

Would appreciate any help.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Now I have read a lot of examples about partial word matching using
ngram, but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.

You're looking for the compound word token filter:

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

this looks good.

Thanks

Am Donnerstag, 21. März 2013 16:53:04 UTC+1 schrieb Clinton Gormley:

Now I have read a lot of examples about partial word matching using
ngram, but to me this seems not the right way to go.
I don't want to match "blu", "blus" or anything the like.
Best way would be to provide a real dictionary of words and let
Elasticsearch strip it into words/tokens.

You're looking for the compound word token filter:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/compound-word-tokenfilter.html

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.