Analyzers and char_filters o_0 creepy outputs

(georgi.mateev) #1

Hi! This is a sample setup, close to what I am working with

As you can see, I am trying to remove the hyphens from all words, so that
words like "hand-made" are indexed as "handmade". The goal is to make a
search for "handmade" find all documents, containing "hand-made" and vice
For some reason it doesn't work, though :frowning:

I have also attached 3 sample queries. The expected result would be for all
of them to return the same result set.

  1. Astonishingly, a search for "Chemie-injenieur" finds 2 results, but a
    search for "Chemieingenieur" finds none. This is pretty creepy to me, since
    the char_filter is supposed to strip the hyphens prior to tokenizing in the
    indexing process.

  2. Another creepy fact is that if I specify the searchAnalyzer explicitly,
    I find no results (see query 3) from this document set

  3. Moreover the analyzeAPI shows that the search term "Chemie-ingenieur"
    gets translated to "Chemieingenieur" using this analyzer

  4. And the most creepy facts is that when I run these queries with the
    actual index data (800+ documents), I get 17 results for "Chemie-ingenieur"
    and 22 for "Chemieingenieur", where NONE OF THEM OVERLAPS. I.e. I have a
    total of 39 documents that should be matching either of the queries. Some
    of the documents that match "Chemie-ingenieur" actually don't contain the
    word with the hyphen. So I would expect these documents to be contained in
    both result sets, maybe with a different relevancy score. This is, however,
    not the case.

Please help me get over this, I have been struggling with it for a full
week already. I would be very grateful for some explanation too, apart from
a solution, since the output is much different that what I expect from my
understanding and this means that I don't really understand the system.

P.S. Please focus on the actual problem and let's not discuss the mapping
into details. The version I have pasted is pretty different than what I
have started with initially, due to the try-and-error approach I have been
using for almost a week.

Thanks sincerely,

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit

(system) #2