Hi! This is a sample setup, close to what I am working with
As you can see, I am trying to remove the hyphens from all words, so that
words like "hand-made" are indexed as "handmade". The goal is to make a
search for "handmade" find all documents, containing "hand-made" and vice
versa.
For some reason it doesn't work, though
I have also attached 3 sample queries. The expected result would be for all
of them to return the same result set.
-
Astonishingly, a search for "Chemie-injenieur" finds 2 results, but a
search for "Chemieingenieur" finds none. This is pretty creepy to me, since
the char_filter is supposed to strip the hyphens prior to tokenizing in the
indexing process. -
Another creepy fact is that if I specify the searchAnalyzer explicitly,
I find no results (see query 3) from this document set -
Moreover the analyzeAPI shows that the search term "Chemie-ingenieur"
gets translated to "Chemieingenieur" using this analyzer -
And the most creepy facts is that when I run these queries with the
actual index data (800+ documents), I get 17 results for "Chemie-ingenieur"
and 22 for "Chemieingenieur", where NONE OF THEM OVERLAPS. I.e. I have a
total of 39 documents that should be matching either of the queries. Some
of the documents that match "Chemie-ingenieur" actually don't contain the
word with the hyphen. So I would expect these documents to be contained in
both result sets, maybe with a different relevancy score. This is, however,
not the case.
Please help me get over this, I have been struggling with it for a full
week already. I would be very grateful for some explanation too, apart from
a solution, since the output is much different that what I expect from my
understanding and this means that I don't really understand the system.
P.S. Please focus on the actual problem and let's not discuss the mapping
into details. The version I have pasted is pretty different than what I
have started with initially, due to the try-and-error approach I have been
using for almost a week.
Thanks sincerely,
Georgi
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/417363d0-965f-4398-8174-9889db47d50b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.