Search for compounded words


#1

If I use Compounded Word Token Filter for search term "shoerack" I get ["shoerack", "shoe", "rack"] all with the same position in token stream. Because of that search query is created with OR operator between words - as ("shoerack" OR "shoe" OR "rack"). What I need is only ("shoe" AND "rack").

I believe that only way to achieve this is to have Compounded Word Token Filter that:

  • Doesn't output original word (in my case it would output only ["shoe", "rack"])

  • Increases position for every word created by compounding (in my case it would output only ["shoe(1)", "rack(2)"].

Is this possible in Elasticsearch? I could not find anything in the documentation.


(Ivan Brusic) #2

Positions should only matter if you are executing a phrase or span query. I
am assuming your problem is that your query is hitting on content that only
contains half the terms (only "shoe" or "rack") due to the OR operand.

What type of query are you executing? Is the search term "shoerack"?
Creating your own query of ("shoe" AND "rack") should work.

Cheers,

Ivan


#3

Hi Ivan,

Yes, my search term is "shoerack" and I want to find all documents that have "shoe" AND "rack" in the text. I could create my own query (of "shoe" AND "rack") but that would require of me to split "shoerack" by building my own custom compounder in my web application. I'm actually dealing with German here and there are many search terms that should be compounded in this way.
I would rather let Elasticsearch to do it since it's probably faster that way (and certainly more convenient).

Cheers,

Kresimir


(Bart Kooijman) #4

Hello Kresimir,

Did you ever solve this problem in Elasticsearch instead of your web application?

Regards,

Bart.


(system) #5