I like this feature!, I can see many different use cases for it. I opened
Note though, there is a caveat here. Remember that when indexing data, the
analyzer is also applied, so 'empires' indexed will be indexed as 'empire'
(with stemming). And, if you don't do any stemming on it when searching,
"empires" (with non stemming analyzer) will not find anything.
On Wed, May 9, 2012 at 9:53 AM, Laser Jesus email@example.com:
I've just created a gist script here: https://gist.github.com/2642394
I setup a fresh index with the snowball stemmer. I then create a
percolator for the term "empire" (with the quotes). I then percolate a
document with the text 'empire', which correctly matches. I then percolate
another document with the text 'empires' and again the percolator matches.
This second example is matching a stemmed version of the original
percolator, however I was hoping that it wouldn't match since the
percolator had the search term in quotations, indicating the need for an
If you search Google for 'car' you will get matches for 'cars', however if
you search for "car" (with quotes) you will only get matches for 'car', not
the plural form. I was hoping to get this natural language functionality
out of the box with ES. I'm pretty sure Lucene doesn't natively support
this so it's a pretty tall order. As I said previously I can create two
sets of percolators, one with stemming and one without. Then I can
register queries that use quotes with the non-stemmed and all others with
the stemmed, then percolate each document against both sets. This is good
enough for the moment but it would be really great to handle mixed queries,
e.g. '"car" AND fight' matching 'car ... fights', whereby stemming has been
applied to the fight term but not the car term.
I'm just wondering if there is a cleaner way to achieve what I want with
the existing codebase, rather than specifying a feature requests.