Decompounder in query_string analyzer

Hi everyone,

I'm building a search engine for a German website and therefore have to
deal with compound word filters...

The main problem currently are compound nouns that are sometimes written as
one word and sometimes divided by dashes, e.g. "Schlossbergtunnel" and
"Schlossberg-Tunnel". A query for either "Schlossbergtunnel" or
"Schlossberg-Tunnel" (or "Schlossberg Tunnel") should match both
"Schlossbergtunnel" and "Schlossberg-Tunnel".

My current approach is to use dictionary_decompounder with "schlossberg"
and "tunnel" provided in the word list:

index:
analysis:
filter :
decompound_filter:
type : dictionary_decompounder
word_list: ["schlossberg", "tunnel"]

This works well for the indexer. But the problem is the analyzer used for
the query_string query:

  • If I don't include the decompound filter in the query_string analyzer,
    a query for "schlossbergtunnel" does not find documents containing
    "Schlossberg-Tunnel".
  • If the decompounder is included in the query analyzer, the query for
    "schlossbergtunnel" matches "Schlossberg-Tunnel". But it also matches
    documents where only either "Schlossberg" or "Tunnel" occur which is not
    what I want. It seems like the tokens expanded by the decompound filter are
    joined using OR.

In other words: The query string "schlossbergtunnel" using the decompound
filter behaves like "schlossbergtunnel OR schlossberg OR tunnel", but what
I need is "schlossbergtunnel OR (schlossberg AND tunnel)".
Any ideas how to achieve this?

Thanks for any hints,
Christoph

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

3 Likes