Stop Word and position, this break some use case


(Alexgarel) #1

Hello,

We are switching from Solr to ElasticSearch.

I read all the topic about StopWord in the guide and of course reference to Stop Token Filter

There is the choice of keeping word position while eliminating stop words. I perfectly understand the advocacy about it as it come with dealing with performance and certain use case. But I think this miss a use case.
We remove stop words not in order to increase performance, but in order to increase fuziness in a meaningful manner. The documentation is advocating you should use approximated search for that instead of phrase search, but this is not the same.

In our case we use stop words removal so that "maison de campagne" matches "maison à la campagne" "maison de campagne" "maison en campagne".
With ES "maison de campagne" won't match "maison à la campagne"
"maison campagne" won't match any of those.
Proximity search is not the same as it would also match "maison hors campagne".
By the way "maison de campagne", because of positions and stop word elimination, also matches "maison hors campagne", which is strange.

Common terms queries does not help either here, positions are still retained.

I am missing something ?

For the moment I am trying to remove stop words using a pattern_replace char filter, with the big drawback that I can't specify that I want case insensitive matches.


(system) #2